Lucene - Core
  1. Lucene - Core
  2. LUCENE-1434

IndexableBinaryStringTools: convert arbitrary byte sequences into Strings that can be used as index terms, and vice versa

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 2.4
    • Fix Version/s: 2.9
    • Component/s: core/other
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      Provides support for converting byte sequences to Strings that can be used as index terms, and back again. The resulting Strings preserve the original byte sequences' sort order (assuming the bytes are interpreted as unsigned).

      The Strings are constructed using a Base 8000h encoding of the original binary data - each char of an encoded String represents a 15-bit chunk from the byte sequence. Base 8000h was chosen because it allows for all lower 15 bits of char to be used without restriction; the surrogate range [U+D800-U+DFFF] does not represent valid chars, and would require complicated handling to avoid them and allow use of char's high bit.

      This class is intended to serve as a mechanism to allow CollationKeys to serve as index terms.

        Issue Links

          Activity

          Hide
          Michael McCandless added a comment -

          This looks good. I plan to commit shortly!

          Show
          Michael McCandless added a comment - This looks good. I plan to commit shortly!
          Hide
          Michael McCandless added a comment -

          Thanks Steven!

          Show
          Michael McCandless added a comment - Thanks Steven!

            People

            • Assignee:
              Michael McCandless
              Reporter:
              Steve Rowe
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development