Lucene - Core
  1. Lucene - Core
  2. LUCENE-3681

FST.BYTE2 should save as fixed 2 byte not as vInt


    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.6, 4.0-ALPHA
    • Component/s: None
    • Labels:
    • Lucene Fields:


      We currently write BYTE1 as a single byte, but BYTE2/4 as vInt, but I think that's confusing. Also, for the FST for the new Kuromoji analyzer (LUCENE-3305), writing as 2 bytes instead shrank the FST and ran faster, presumably because more values were >= 16384 than were < 128.

      Separately the whole INPUT_TYPE is very confusing... really all it's doing is "declaring" the allowed range of the characters of the input alphabet, and then the only thing that uses that is the write/readLabel methods (well and some confusing sugar methods in Builder!). Not sure how to fix that yet...

      It's a simple change but it changes the FST binary format so any users w/ FSTs out there will have to rebuild (FST is marked experimental...).

      1. LUCENE-3681.patch
        3 kB
        Michael McCandless

        Issue Links


          Uwe Schindler made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Michael McCandless made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Michael McCandless made changes -
          Attachment LUCENE-3681.patch [ 12509856 ]
          Dawid Weiss made changes -
          Field Original Value New Value
          Link This issue is related to LUCENE-3206 [ LUCENE-3206 ]
          Michael McCandless created issue -


            • Assignee:
              Michael McCandless
              Michael McCandless
            • Votes:
              0 Vote for this issue
              0 Start watching this issue


              • Created: