Lucene - Core
  1. Lucene - Core
  2. LUCENE-4227

DirectPostingsFormat, storing postings as simple int[] in memory, if you have tons of RAM

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0-BETA, master
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      This postings format just wraps Lucene40 (on disk) but then at search
      time it loads (up front) all terms postings into RAM.

      You'd use this if you have insane amounts of RAM and want the fastest
      possible search performance. The postings are not compressed: docIds,
      positions are stored as straight int[]s.

      The terms are stored as a skip list (array of byte[]), but I packed
      all terms together into a single long byte[]: I had started as actual
      separate byte[] per term but the added pointer deref and loss of
      locality was a lot (~2X) slower for terms-dict intensive queries like
      FuzzyQuery.

      Low frequency postings (docFreq <= 32 by default) store all docs, pos
      and offsets into a single int[]. High frequency postings store docs
      as int[], freqs as int[], and positions as int[][] parallel arrays.
      For skipping I just do a growing binary search.

      I also made specialized DirectTermScorer and DirectExactPhraseScorer
      for the high freq case that just pull the int[] and iterate
      themselves.

      All tests pass.

      1. LUCENE-4227.patch
        81 kB
        Michael McCandless
      2. LUCENE-4227.patch
        74 kB
        Michael McCandless
      3. LUCENE-4227.patch
        91 kB
        Michael McCandless

        Activity

        No work has yet been logged on this issue.

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Michael McCandless
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development