Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-5152

EdgeNGramFilterFactory deletes token

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 4.4
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      I am using EdgeNGramFilterFactory in my schema.xml

      <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
        <analyzer type="index">
          <!-- ... -->
          <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="10" side="front" />
        </analyzer>
      </fieldType>

      Some tokens in my index only consist of one character, let's say R. minGramSize is set to 2 and is bigger than the length of the token. I expected the NGramFilter to left R unchanged but in fact it is deleting the token.

      For my use case this interpretation is undesirable, and probably for most use cases too!?

        Attachments

        1. SOLR-5152.patch
          22 kB
          Furkan Kamaci
        2. SOLR-5152-v5.0.0.patch
          19 kB
          Sergey Urushkin

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                christoph.lingg Christoph Lingg
              • Votes:
                4 Vote for this issue
                Watchers:
                8 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: