Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-4225

New FixedPostingsFormat for less overhead than SepPostingsFormat



    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • None
    • None
    • None
    • New


      I've worked out the start at a new postings format that should have
      less overhead for fixed-int[] encoders (For,PFor)... using ideas from
      the old bulk branch, and new ideas from Robert.

      It's only a start: there's no payloads support yet, and I haven't run
      Lucene's tests with it, except for one new test I added that tries to
      be a thorough PostingsFormat tester (to make it easier to create new
      postings formats). It does pass luceneutil's performance test, so
      it's at least able to run those queries correctly...

      Like Lucene40, it uses two files (though once we add payloads it may
      be 3). The .doc file interleaves doc delta and freq blocks, and .pos
      has position delta blocks. Unlike sep, blocks are NOT shared across
      terms; instead, it uses block encoding if there are enough ints to
      encode, else the same Lucene40 vInt format. This means low-freq terms
      (< 128 = current default block size) are always vInts, and high-freq
      terms will have some number of blocks, with a vInt final block.

      Skip points are only recorded at block starts.


        1. LUCENE-4225.patch
          126 kB
          Michael McCandless
        2. LUCENE-4225.patch
          91 kB
          Michael McCandless
        3. LUCENE-4225.patch
          90 kB
          Michael McCandless
        4. LUCENE-4225.patch
          95 kB
          Michael McCandless
        5. LUCENE-4225-on-rev-1362013.patch
          93 kB
          Han Jiang

        Issue Links



              mikemccand Michael McCandless
              mikemccand Michael McCandless
              0 Vote for this issue
              4 Start watching this issue