Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-1710

convert worddelimiterfilter to new tokenstream API

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 3.1
    • Schema and Analysis
    • None

    Description

      This one was a doozy, attached is a patch to convert it to the new tokenstream API.

      Some of the logic was split into WordDelimiterIterator (exposes a BreakIterator-like api for iterating subwords)
      the filter is much more efficient now, no cloning.

      before applying the patch, copy the existing WordDelimiterFilter to OriginalWordDelimiterFilter
      the patch includes a testcase (TestWordDelimiterBWComp) which generates random strings from various subword combinations.
      For each random string, it compares output against the existing WordDelimiterFilter for all 512 combinations of boolean parameters.

      NOTE: due to bugs found (SOLR-1706), this currently only tests 256 of these combinations. The bugs discovered in SOLR-1706 are fixed here.

      Attachments

        1. SOLR-1710-readable.patch
          59 kB
          Chris Male
        2. SOLR-1710-readable.patch
          59 kB
          Chris Male
        3. SOLR-1710.patch
          48 kB
          Robert Muir
        4. SOLR-1710.patch
          48 kB
          Robert Muir

        Issue Links

          Activity

            People

              markrmiller@gmail.com Mark Miller
              rcmuir Robert Muir
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: