Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-3069 Lucene should have an entirely memory resident term dictionary
  3. LUCENE-5029

factor out a generic 'TermState' for better sharing in FST-based term dict

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.4
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Currently, those two FST-based term dict (memory codec & blocktree) all use FST<BytesRef> as a base data structure, this might not share much data in parent arcs, since the encoded BytesRef doesn't guarantee that 'Outputs.common()' always creates a long prefix.

      While for current postings format, it is guaranteed that each FP (pointing to .doc, .pos, etc.) will increase monotonically with 'larger' terms. That means, between two Outputs, the Outputs from smaller term can be safely pushed towards root. However we always have some tricky TermState to deal with (like the singletonDocID for pulsing trick), so as Mike suggested, we can simply cut the whole TermState into two parts: one part for comparation and intersection, another for restoring generic data. Then the data structure will be clear: this generic 'TermState' will consist of a fixed-length LongsRef and variable-length BytesRef.

        Attachments

        1. LUCENE-5029.algebra.patch
          56 kB
          Han Jiang
        2. LUCENE-5029.algebra.patch
          40 kB
          Han Jiang
        3. LUCENE-5029.branch-init.patch
          281 kB
          Han Jiang
        4. LUCENE-5029.patch
          23 kB
          Han Jiang
        5. LUCENE-5029.patch
          284 kB
          Han Jiang
        6. LUCENE-5029.patch
          283 kB
          Han Jiang
        7. LUCENE-5029.patch
          120 kB
          Han Jiang
        8. LUCENE-5029.patch
          8 kB
          Han Jiang

          Activity

            People

            • Assignee:
              billy Han Jiang
              Reporter:
              billy Han Jiang
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: