Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-1227

NGramTokenizer to handle more than 1024 chars

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • None
    • modules/analysis
    • None
    • New, Patch Available

    Description

      Current NGramTokenizer can't handle character stream that is longer than 1024. This is too short for non-whitespace-separated languages.

      I created a patch for this issues.

      Attachments

        1. LUCENE-1227.patch
          12 kB
          Hiroaki Kawai
        2. NGramTokenizer.patch
          3 kB
          Hiroaki Kawai
        3. NGramTokenizer.patch
          3 kB
          Hiroaki Kawai

        Issue Links

          Activity

            People

              Unassigned Unassigned
              kawai Hiroaki Kawai
              Votes:
              3 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: