Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-3699

kuromoji dictionary could be more compact

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.6, 4.0-ALPHA
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Reading thru the ipadic documentation, i realized we are storing a lot of redundant information,
      for example the connection costs for bigram weights are based on POS+inflection data, so its redundant
      to also separately encode POS and inflection data for each entry.

      With the patch the dictionary access is also faster and simpler, and TokenInfoDictionary is 1.5MB smaller.

        Attachments

        1. LUCENE-3699_more.patch
          16 kB
          Robert Muir
        2. LUCENE-3699.patch
          21 kB
          Robert Muir

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              rcmuir Robert Muir
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: