Uploaded image for project: 'Apache Cassandra'
  1. Apache Cassandra
  2. CASSANDRA-1232

UTF8Type.compare() is slow and dangerous

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Normal
    • Resolution: Fixed
    • 0.6.4
    • None
    • None

    Description

      UTF8Type converts both byte arrays into Strings and then compares them. This is unnecessary and slow because UTF-8 encoded Strings are already directly comparable. Higher codepoints yield higher initial and subsequent bytes. One can safely use BytesType.compare() for UTF-8. Maybe UTF8Type should be a subclass only overriding getString().

      BTW, It's also dangerous to ignore invalid byte sequences. At this point the byte array should contain valid UTF-8.

      Attachments

        Issue Links

          Activity

            People

              nickmbailey Nick Bailey
              messi Folke Behrens
              Nick Bailey
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: