Solr
  1. Solr
  2. SOLR-1710

convert worddelimiterfilter to new tokenstream API

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.1
    • Component/s: Schema and Analysis
    • Labels:
      None

      Description

      This one was a doozy, attached is a patch to convert it to the new tokenstream API.

      Some of the logic was split into WordDelimiterIterator (exposes a BreakIterator-like api for iterating subwords)
      the filter is much more efficient now, no cloning.

      before applying the patch, copy the existing WordDelimiterFilter to OriginalWordDelimiterFilter
      the patch includes a testcase (TestWordDelimiterBWComp) which generates random strings from various subword combinations.
      For each random string, it compares output against the existing WordDelimiterFilter for all 512 combinations of boolean parameters.

      NOTE: due to bugs found (SOLR-1706), this currently only tests 256 of these combinations. The bugs discovered in SOLR-1706 are fixed here.

      1. SOLR-1710.patch
        48 kB
        Robert Muir
      2. SOLR-1710.patch
        48 kB
        Robert Muir
      3. SOLR-1710-readable.patch
        59 kB
        Chris Male
      4. SOLR-1710-readable.patch
        59 kB
        Chris Male

        Issue Links

          Activity

          No work has yet been logged on this issue.

            People

            • Assignee:
              Mark Miller
              Reporter:
              Robert Muir
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development