Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-4227

DirectPostingsFormat, storing postings as simple int[] in memory, if you have tons of RAM

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 4.0-BETA, 6.0
    • None
    • None
    • New

    Description

      This postings format just wraps Lucene40 (on disk) but then at search
      time it loads (up front) all terms postings into RAM.

      You'd use this if you have insane amounts of RAM and want the fastest
      possible search performance. The postings are not compressed: docIds,
      positions are stored as straight int[]s.

      The terms are stored as a skip list (array of byte[]), but I packed
      all terms together into a single long byte[]: I had started as actual
      separate byte[] per term but the added pointer deref and loss of
      locality was a lot (~2X) slower for terms-dict intensive queries like
      FuzzyQuery.

      Low frequency postings (docFreq <= 32 by default) store all docs, pos
      and offsets into a single int[]. High frequency postings store docs
      as int[], freqs as int[], and positions as int[][] parallel arrays.
      For skipping I just do a growing binary search.

      I also made specialized DirectTermScorer and DirectExactPhraseScorer
      for the high freq case that just pull the int[] and iterate
      themselves.

      All tests pass.

      Attachments

        1. LUCENE-4227.patch
          91 kB
          Michael McCandless
        2. LUCENE-4227.patch
          74 kB
          Michael McCandless
        3. LUCENE-4227.patch
          81 kB
          Michael McCandless

        Activity

          People

            mikemccand Michael McCandless
            mikemccand Michael McCandless
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: