Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-3699

kuromoji dictionary could be more compact

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 3.6, 4.0-ALPHA
    • None
    • None
    • New

    Description

      Reading thru the ipadic documentation, i realized we are storing a lot of redundant information,
      for example the connection costs for bigram weights are based on POS+inflection data, so its redundant
      to also separately encode POS and inflection data for each entry.

      With the patch the dictionary access is also faster and simpler, and TokenInfoDictionary is 1.5MB smaller.

      Attachments

        1. LUCENE-3699.patch
          21 kB
          Robert Muir
        2. LUCENE-3699_more.patch
          16 kB
          Robert Muir

        Activity

          People

            Unassigned Unassigned
            rcmuir Robert Muir
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: