Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.5, 7.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Currently, WordDelimiterFilter doesn't try to set the posLen attribute and so it creates graphs like this:

      but with this patch (still a work in progress) it creates this graph instead:

      This means (today) positional queries when using WDF at search time are buggy, but since we fixed LUCENE-7603, with this change here you should be able to use positional queries with WDGF.

      I'm also trying to produce holes properly (removes logic from the current WDF that swallows a hole when whole token is just delimiters).

      Surprisingly, it's actually quite easy to tweak WDF to create a graph (unlike e.g. SynonymGraphFilter) because it's already creating the necessary new positions, and its output graph never has side paths, except for single tokens that skip nodes because they have posLen > 1. I.e. the only fix to make, I think, is to set posLen properly. And it really helps that it does its own "new token buffering + sorting" already.

        Attachments

        1. LUCENE-7619.patch
          165 kB
          Michael McCandless
        2. LUCENE-7619.patch
          162 kB
          Michael McCandless
        3. LUCENE-7619.patch
          124 kB
          Michael McCandless
        4. after.png
          41 kB
          Michael McCandless
        5. before.png
          37 kB
          Michael McCandless

          Activity

            People

            • Assignee:
              mikemccand Michael McCandless
              Reporter:
              mikemccand Michael McCandless
            • Votes:
              1 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: