Lucene - Core
  1. Lucene - Core
  2. LUCENE-4534

WFST/AnalyzingSuggest don't handle keys containing 0 bytes correctly

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.1, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      While binary terms w/ 0 bytes are rare, they are "allowed" but will cause exceptions with at least WFST/AnalyzingSuggester.

      I think to fix this we should pass custom Comparator to the offline sorter that decodes each BytesRef key and does the actual comparison we want, instead of using separator and relying on BytesRef.compareTo.

      1. LUCENE-4534.patch
        23 kB
        Michael McCandless
      2. LUCENE-4534.patch
        3 kB
        Michael McCandless

        Activity

        Hide
        Michael McCandless added a comment -

        Patch w/ failing test case for WFSTCompletionLookup and AnalyzingSuggester.

        Show
        Michael McCandless added a comment - Patch w/ failing test case for WFSTCompletionLookup and AnalyzingSuggester.
        Hide
        Michael McCandless added a comment -

        Patch w/ fix.

        Basically, instead of relying on sorting a single "packed" byte[], I decode each byte[] into its parts (key/weight/analyzed form) and do the comparison "directly". This is cleaner because we no longer need to rely on separators that then cause 0 bytes to not work...

        Show
        Michael McCandless added a comment - Patch w/ fix. Basically, instead of relying on sorting a single "packed" byte[], I decode each byte[] into its parts (key/weight/analyzed form) and do the comparison "directly". This is cleaner because we no longer need to rely on separators that then cause 0 bytes to not work...
        Hide
        Robert Muir added a comment -

        +1!

        Show
        Robert Muir added a comment - +1!
        Hide
        Commit Tag Bot added a comment -

        [branch_4x commit] Michael McCandless
        http://svn.apache.org/viewvc?view=revision&revision=1405978

        LUCENE-4534: dedup same surface form in Analyzing/FuzzySuggester

        Show
        Commit Tag Bot added a comment - [branch_4x commit] Michael McCandless http://svn.apache.org/viewvc?view=revision&revision=1405978 LUCENE-4534 : dedup same surface form in Analyzing/FuzzySuggester
        Hide
        Commit Tag Bot added a comment -

        [branch_4x commit] Michael McCandless
        http://svn.apache.org/viewvc?view=revision&revision=1405963

        LUCENE-4534: handle 0 byte values in lookup keys

        Show
        Commit Tag Bot added a comment - [branch_4x commit] Michael McCandless http://svn.apache.org/viewvc?view=revision&revision=1405963 LUCENE-4534 : handle 0 byte values in lookup keys

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Michael McCandless
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development