Lucene - Core
  1. Lucene - Core
  2. LUCENE-3699

kuromoji dictionary could be more compact

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.6, 4.0-ALPHA
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Reading thru the ipadic documentation, i realized we are storing a lot of redundant information,
      for example the connection costs for bigram weights are based on POS+inflection data, so its redundant
      to also separately encode POS and inflection data for each entry.

      With the patch the dictionary access is also faster and simpler, and TokenInfoDictionary is 1.5MB smaller.

      1. LUCENE-3699_more.patch
        16 kB
        Robert Muir
      2. LUCENE-3699.patch
        21 kB
        Robert Muir

        Activity

          People

          • Assignee:
            Unassigned
            Reporter:
            Robert Muir
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development