Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-3767

Explore streaming Viterbi search in Kuromoji

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.6, 4.0-ALPHA
    • Component/s: modules/analysis
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      I've been playing with the idea of changing the Kuromoji viterbi
      search to be 2 passes (intersect, backtrace) instead of 4 passes
      (break into sentences, intersect, score, backtrace)... this is very
      much a work in progress, so I'm just getting my current state up.
      It's got tons of nocommits, doesn't properly handle the user dict nor
      extended modes yet, etc.

      One thing I'm playing with is to add a double backtrace for the long
      compound tokens, ie, instead of penalizing these tokens so that
      shorter tokens are picked, leave the scores unchanged but on backtrace
      take that penalty and use it as a threshold for a 2nd best
      segmentation...

        Attachments

        1. compound_diffs.txt
          48 kB
          Michael McCandless
        2. LUCENE-3767_branch_3x.patch
          187 kB
          Christian Moen
        3. LUCENE-3767.patch
          186 kB
          Michael McCandless
        4. LUCENE-3767.patch
          184 kB
          Michael McCandless
        5. LUCENE-3767.patch
          188 kB
          Michael McCandless
        6. LUCENE-3767.patch
          68 kB
          Michael McCandless
        7. LUCENE-3767.patch
          59 kB
          Michael McCandless
        8. SolrXml-5498.xml
          31 kB
          Christian Moen

          Issue Links

            Activity

              People

              • Assignee:
                cm Christian Moen
                Reporter:
                mikemccand Michael McCandless
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: