Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-1710

convert worddelimiterfilter to new tokenstream API

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.1
    • Component/s: Schema and Analysis
    • Labels:
      None

      Description

      This one was a doozy, attached is a patch to convert it to the new tokenstream API.

      Some of the logic was split into WordDelimiterIterator (exposes a BreakIterator-like api for iterating subwords)
      the filter is much more efficient now, no cloning.

      before applying the patch, copy the existing WordDelimiterFilter to OriginalWordDelimiterFilter
      the patch includes a testcase (TestWordDelimiterBWComp) which generates random strings from various subword combinations.
      For each random string, it compares output against the existing WordDelimiterFilter for all 512 combinations of boolean parameters.

      NOTE: due to bugs found (SOLR-1706), this currently only tests 256 of these combinations. The bugs discovered in SOLR-1706 are fixed here.

        Attachments

        1. SOLR-1710-readable.patch
          59 kB
          Chris Male
        2. SOLR-1710-readable.patch
          59 kB
          Chris Male
        3. SOLR-1710.patch
          48 kB
          Robert Muir
        4. SOLR-1710.patch
          48 kB
          Robert Muir

          Issue Links

            Activity

              People

              • Assignee:
                markrmiller@gmail.com Mark Miller
                Reporter:
                rcmuir Robert Muir
              • Votes:
                0 Vote for this issue
                Watchers:
                0 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: