Details

      Description

      Some token filters uses the position length attribute of the token stream to encode the number of terms they put in a single token.
      This breaks the query parsing because it creates disconnected graph.
      I've tracked down the abusive case to 2 candidates:

      • ShingleFilter which sets the position length attribute to the length of the shingle.
      • CJKBigramFilter which always sets the position length attribute to 2.

      I don't think these filters should set the position length at all so the best would be to remove the attribute from these token filters but this could break BWC.
      Though this is a serious bug since shingles and cjk bigram now produce invalid queries.

        Attachments

        1. LUCENE-7708.patch
          6 kB
          Jim Ferenczi
        2. LUCENE-7708.patch
          6 kB
          Jim Ferenczi

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              jim.ferenczi Jim Ferenczi
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: