Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-41

PATCH: HyphenatedWordsFilter, Factory and test

    Details

    • Type: New Feature
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.1.0
    • Component/s: search
    • Labels:
      None

      Description

      When the plain text is extracted from documents, we will often have many words hyphenated and broken into two lines. This is often the case with documents where narrow text columns are used, such as newsletters.
      In order to increase searching efficiency, this filter unites hyphenated words broken in two lines.
      This filter has to be used together with the WordDelimiterFilter having catenateWords=1.

        Attachments

        1. TestHyphenatedWordsFilter.java
          2 kB
          Boris Vitez
        2. HyphenatedWordsFilterFactory.java
          1.0 kB
          Boris Vitez
        3. hyphenatedwordsfilter.patch
          7 kB
          Boris Vitez
        4. hyphenatedwordsfilter.patch
          7 kB
          Boris Vitez
        5. HyphenatedWordsFilter.java
          4 kB
          Boris Vitez

          Activity

            People

            • Assignee:
              yseeley@gmail.com Yonik Seeley
              Reporter:
              bole5 Boris Vitez
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: