[CASSANDRA-1232] UTF8Type.compare() is slow and dangerous - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Normal
Resolution: Fixed
Fix Version/s: 0.6.4
Component/s: None
Labels:
None

Description

UTF8Type converts both byte arrays into Strings and then compares them. This is unnecessary and slow because UTF-8 encoded Strings are already directly comparable. Higher codepoints yield higher initial and subsequent bytes. One can safely use BytesType.compare() for UTF-8. Maybe UTF8Type should be a subclass only overriding getString().

BTW, It's also dangerous to ignore invalid byte sequences. At this point the byte array should contain valid UTF-8.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

0001-Fixes-to-UTF8Type-compare-and-getString-methods.patch
30/Jun/10 16:43
2 kB
Nick Bailey

Issue Links

is related to

CASSANDRA-1196 Invalid UTF-8 keys [for legacy OPP] should cause exceptions

Resolved

Activity

People

Assignee:: Nick Bailey

Reporter:: Folke Behrens

Authors:: Nick Bailey

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 26/Jun/10 17:52

Updated:: 16/Apr/19 09:33

Resolved:: 01/Jul/10 01:32