Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.2
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      To accelerate posting list skips (TermDocs.skipTo(int)) Lucene uses skip lists.
      The default skip interval is set to 16. If we want to skip e. g. 100 documents,
      then it is not necessary to read 100 entries from the posting list, but only
      100/16 = 6 skip list entries plus 100%16 = 4 entries from the posting list. This
      speeds up conjunction (AND) and phrase queries significantly.

      However, the skip interval is always a compromise. If you have a very big index
      with huge posting lists and you want to skip over lets say 100k documents, then
      it is still necessary to read 100k/16 = 6250 entries from the skip list. For big
      indexes the skip interval could be set to a higher value, but then after a big
      skip a long scan to the target doc might be necessary.

      A solution for this compromise is to have multi-level skip lists that guarantee a
      logarithmic amount of skips to any target in the posting list. This patch
      implements such an approach in the following way:

      Example for skipInterval = 3:
      c (skip level 2)
      c c c (skip level 1)
      x x x x x x x x x x (skip level 0)
      d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d (posting list)
      3 6 9 12 15 18 21 24 27 30 (df)

      d - document
      x - skip data
      c - skip data with child pointer

      Skip level i contains every skipInterval-th entry from skip level i-1. Therefore the
      number of entries on level i is: floor(df / ((skipInterval ^ (i + 1))).

      Each skip entry on a level i>0 contains a pointer to the corresponding skip entry in
      list i-1. This guarantees a logarithmic amount of skips to find the target document.

      Implementations details:

      • I factored the skipping code out of SegmentMerger and SegmentTermDocs to
        simplify those classes. The two new classes AbstractSkipListReader and
        AbstractSkipListWriter implement the skipping functionality.
      • While AbstractSkipListReader and Writer take care of writing and reading the
        multiple skip levels, they do not implement an actual skip data format. The two
        new subclasses DefaultSkipListReader and Writer implement the skip data format
        that is currently used in Lucene (with two file pointers for the freq and prox
        file and with payload length information). I added this extra layer to be
        prepared for flexible indexing and different posting list formats.

      File format changes:

      • I added the new parameter 'maxSkipLevels' to the term dictionary and increased the
        version of this file. If maxSkipLevels is set to one, then the format of the freq
        file does not change at all, because we only have one skip level as before. For
        backwards compatibility maxSkipLevels is set to one automatically if an index
        without the new parameter is read.
      • In case maxSkipLevels > 1, then the frq file changes as follows:
        FreqFile (.frq) --> <TermFreqs, SkipData>^TermCount
        SkipData --> <<SkipLevelLength, SkipLevel>^(Min(maxSkipLevels,
        floor(log(DocFreq/log(skipInterval))) - 1)>, SkipLevel>
        SkipLevel --> <SkipDatum>DocFreq/(SkipInterval(Level + 1))

      Remark: The length of the SkipLevel is not stored for level 0, because 1) it is not
      needed, and 2) the format of this file does not change for maxSkipLevels=1 then.

      All unit tests pass with this patch.

      1. lucene-866.patch
        45 kB
        Michael Busch
      2. lucene-866.patch
        46 kB
        Michael Busch
      3. fileformats.patch
        6 kB
        Michael Busch

        Issue Links

          Activity

          Hide
          Michael Busch added a comment -

          Committed.

          Show
          Michael Busch added a comment - Committed.
          Hide
          Michael Busch added a comment -

          This patch updates the fileformats document. It also adds a comment
          saying that SkipDelta is only written to the tis file when DocFreq is not
          smaller than SkipInterval (as recently mentioned by Jeff in LUCENE-349).

          Show
          Michael Busch added a comment - This patch updates the fileformats document. It also adds a comment saying that SkipDelta is only written to the tis file when DocFreq is not smaller than SkipInterval (as recently mentioned by Jeff in LUCENE-349 ).
          Hide
          Yonik Seeley added a comment -

          Nice job!
          Starting at the lowest level will probably improve the worst-case scenario for this patch (which wasn't nearly as bad as I expected in the first place).

          +1 to commit

          Show
          Yonik Seeley added a comment - Nice job! Starting at the lowest level will probably improve the worst-case scenario for this patch (which wasn't nearly as bad as I expected in the first place). +1 to commit
          Hide
          Michael Busch added a comment -

          Thanks for reviewing Paul and Doron, and thanks for fixing the NPE, Mike!!

          I'm planning to commit this in a day or so.

          Show
          Michael Busch added a comment - Thanks for reviewing Paul and Doron, and thanks for fixing the NPE, Mike!! I'm planning to commit this in a day or so.
          Hide
          Paul Elschot added a comment -

          The last change to BufferedIndexInput indeed makes all core tests pass again here with this patch applied.

          Show
          Paul Elschot added a comment - The last change to BufferedIndexInput indeed makes all core tests pass again here with this patch applied.
          Hide
          Michael McCandless added a comment -

          Hmm – that NullPointerException I think is from the assert I added for LUCENE-888. I will fix.

          Show
          Michael McCandless added a comment - Hmm – that NullPointerException I think is from the assert I added for LUCENE-888 . I will fix.
          Hide
          Doron Cohen added a comment -

          On a clean checkout it applies cleanly and this test (and all others) pass.

          Show
          Doron Cohen added a comment - On a clean checkout it applies cleanly and this test (and all others) pass.
          Hide
          Paul Elschot added a comment -

          Patch applies cleanly here.
          All tests pass, except for org.apache.lucene.store.TestBufferedIndexInput:

          [junit] Testsuite: org.apache.lucene.store.TestBufferedIndexInput
          [junit] Tests run: 4, Failures: 0, Errors: 1, Time elapsed: 2.536 sec
          [junit]
          [junit] Testcase: testSetBufferSize(org.apache.lucene.store.TestBufferedIndexInput): Caused an ERROR
          [junit] null
          [junit] java.lang.NullPointerException
          [junit] at org.apache.lucene.store.BufferedIndexInput.setBufferSize(BufferedIndexInput.java:52)
          [junit] at org.apache.lucene.store.TestBufferedIndexInput$MockFSDirectory.tweakBufferSizes(TestBufferedIndexInput.java:210)
          [junit] at org.apache.lucene.store.TestBufferedIndexInput.testSetBufferSize(TestBufferedIndexInput.java:164)
          [junit]
          [junit]
          [junit] Test org.apache.lucene.store.TestBufferedIndexInput FAILED

          It's possible that this is due to another diff in my working copy, but a quick look did
          not reveal anything suspicious. In case I'm the only one with this test failing, I'll dig deeper.

          Show
          Paul Elschot added a comment - Patch applies cleanly here. All tests pass, except for org.apache.lucene.store.TestBufferedIndexInput: [junit] Testsuite: org.apache.lucene.store.TestBufferedIndexInput [junit] Tests run: 4, Failures: 0, Errors: 1, Time elapsed: 2.536 sec [junit] [junit] Testcase: testSetBufferSize(org.apache.lucene.store.TestBufferedIndexInput): Caused an ERROR [junit] null [junit] java.lang.NullPointerException [junit] at org.apache.lucene.store.BufferedIndexInput.setBufferSize(BufferedIndexInput.java:52) [junit] at org.apache.lucene.store.TestBufferedIndexInput$MockFSDirectory.tweakBufferSizes(TestBufferedIndexInput.java:210) [junit] at org.apache.lucene.store.TestBufferedIndexInput.testSetBufferSize(TestBufferedIndexInput.java:164) [junit] [junit] [junit] Test org.apache.lucene.store.TestBufferedIndexInput FAILED It's possible that this is due to another diff in my working copy, but a quick look did not reveal anything suspicious. In case I'm the only one with this test failing, I'll dig deeper.
          Hide
          Michael Busch added a comment -

          New version of the patch with the following changes:

          • Applies cleanly on current trunk
          • Renamed AbstractSkipListReader and AbstractSkipListWriter
            to MultiLevelSkipListReader and MultiLevelSkipListWriter
          • The skipTo() algorithm now starts with the lowest level
            and walks up until it finds the highest level that has a
            skip for the target.
          • Uses the new BufferedIndexInput.setBufferSize() method to
            reduce memory consumption in case skip levels occupy less
            space than BufferedIndexInput.BUFFER_SIZE (1024 bytes)
          • Only preload the top level into memory. That saves memory
            and almost doesn't affect performance.

          All unit tests pass.

          I think this patch is ready to commit now.

          Show
          Michael Busch added a comment - New version of the patch with the following changes: Applies cleanly on current trunk Renamed AbstractSkipListReader and AbstractSkipListWriter to MultiLevelSkipListReader and MultiLevelSkipListWriter The skipTo() algorithm now starts with the lowest level and walks up until it finds the highest level that has a skip for the target. Uses the new BufferedIndexInput.setBufferSize() method to reduce memory consumption in case skip levels occupy less space than BufferedIndexInput.BUFFER_SIZE (1024 bytes) Only preload the top level into memory. That saves memory and almost doesn't affect performance. All unit tests pass. I think this patch is ready to commit now.
          Hide
          Michael Busch added a comment -

          I ran some more performance experiments to test the average speedup.
          I used the same index (about 1.2GB, optimized, docs from wikipedia)
          and created 50,000 random queries. Each query has 3 AND terms and
          each term has a df > 100.

          Here are the results:

          old (one level skipping):
          Time: 62141 ms.
          VInt reads: 752430441

          new (multi level):
          Time: 51969 ms.
          VInt reads: 435504734

          This is a speedup of about 16% and i/o savings of 42%.

          Then I changed the algorithm a bit to not always start on the
          highest skip level to find the next skip but to start at level 0
          and walk up until the next skip point is greater than the target.
          This speeds up things further so that the overall gain is about
          20%:

          Time: 49500 ms.
          Bytes read: 435504734

          This 20% speedup is for average AND queries. Certain queries
          benefit even more from multi-level skipping as the results
          show that I attached earlier.

          I will attach a new patch as soon as LUCENE-888 is committed.

          Show
          Michael Busch added a comment - I ran some more performance experiments to test the average speedup. I used the same index (about 1.2GB, optimized, docs from wikipedia) and created 50,000 random queries. Each query has 3 AND terms and each term has a df > 100. Here are the results: old (one level skipping): Time: 62141 ms. VInt reads: 752430441 new (multi level): Time: 51969 ms. VInt reads: 435504734 This is a speedup of about 16% and i/o savings of 42%. Then I changed the algorithm a bit to not always start on the highest skip level to find the next skip but to start at level 0 and walk up until the next skip point is greater than the target. This speeds up things further so that the overall gain is about 20%: Time: 49500 ms. Bytes read: 435504734 This 20% speedup is for average AND queries. Certain queries benefit even more from multi-level skipping as the results show that I attached earlier. I will attach a new patch as soon as LUCENE-888 is committed.
          Hide
          Michael Busch added a comment -

          LUCENE-888 will introduce the new method BufferedIndexInput.setBufferSize(). Since the length of the different skip levels is known I can set the buffer length before I clone the skip stream in this patch.

          Show
          Michael Busch added a comment - LUCENE-888 will introduce the new method BufferedIndexInput.setBufferSize(). Since the length of the different skip levels is known I can set the buffer length before I clone the skip stream in this patch.
          Hide
          Michael Busch added a comment -

          > and worst case would be skipTo(currDoc+2) in a loop
          > (or a conjunction across terms that appear in almost all docs).

          True that should be the worst case. A query with 3 stop words
          takes about 2% longer:

          Query: +the +and +a
          682737 total matching documents
          VInt reads: old: 481897700, new: 481416800, -0.09979296435737295%
          Time: old: 27812ms, new: 28406ms, 2.1357687329210413%

          Maybe a possible optimization could be to avoid using the higher levels
          after a certain amount of small skips have been performed. I will try out if
          this will improve things.

          > do you avoid storing extra levels when the docfreq is small?
          > What is the size cost for terms with docfreq=1?

          I only store another level if it contains at least one SkipDatum.
          So there is no overhead for terms with df=1.

          The .frq file in my index grew by 1.3% in the multi-level case
          for an index with about 170MB.

          Show
          Michael Busch added a comment - > and worst case would be skipTo(currDoc+2) in a loop > (or a conjunction across terms that appear in almost all docs). True that should be the worst case. A query with 3 stop words takes about 2% longer: Query: +the +and +a 682737 total matching documents VInt reads: old: 481897700, new: 481416800, -0.09979296435737295% Time: old: 27812ms, new: 28406ms, 2.1357687329210413% Maybe a possible optimization could be to avoid using the higher levels after a certain amount of small skips have been performed. I will try out if this will improve things. > do you avoid storing extra levels when the docfreq is small? > What is the size cost for terms with docfreq=1? I only store another level if it contains at least one SkipDatum. So there is no overhead for terms with df=1. The .frq file in my index grew by 1.3% in the multi-level case for an index with about 170MB.
          Hide
          Yonik Seeley added a comment -

          Looks interesting!

          Have you done any performance testing? I guess best case for this patch would be "skip to a random doc", and worst case would be skipTo(currDoc+2) in a loop (or a conjunction across terms that appear in almost all docs).

          I haven't dug into the code, but do you avoid storing extra levels when the docfreq is small?
          What is the size cost for terms with docfreq=1?

          Show
          Yonik Seeley added a comment - Looks interesting! Have you done any performance testing? I guess best case for this patch would be "skip to a random doc", and worst case would be skipTo(currDoc+2) in a loop (or a conjunction across terms that appear in almost all docs). I haven't dug into the code, but do you avoid storing extra levels when the docfreq is small? What is the size cost for terms with docfreq=1?
          Hide
          Michael Busch added a comment -

          I ran a few performance tests with this patch. I built a rather big
          index (about 1.2GB) using documents from Wikipedia and optimized the
          index to get big posting lists.

          I compare query evaluation time with one-level skipping (old) and
          multi-level skipping (new) and measure the amount of I/O by counting
          the number of VInts read from disk. Each query runs several thousand
          times.

          The following queries contain very frequent and very unique terms.
          For these queries the speedup with multi-level skipping is
          significant:

          Query: +lucene +search +engine +library
          5 total matching documents
          VInt reads: old: 143268000, new: 3933000, -97.25479520897898%
          Time: old: 7234ms, new: 1157ms, -84.00608238871993%

          Query: +apache +http +server
          181 total matching documents
          VInt reads: old: 155892000, new: 27849000, -82.13570933723346%
          Time: old: 10656ms, new: 5703ms, -46.48085585585586%

          Even though I/O is reduced for the next query, it runs a bit slower.
          I believe the reason is that the same query runs several thousand
          times, so the posting lists will be loaded into the file system
          cache and the effect of less I/O is reduced, while the skipping
          algorithm itself is a bit more complex:

          Query: +multi +level +skipping
          13 total matching documents
          VInt reads: old: 42894000, new: 39096000, -8.854385228703315%
          Time: old: 3875ms, new: 3922ms, 1.2129032258064516%

          For the next query there is slightly more I/O necessary in the
          multi-skipping case. This is because not many big skips can be made,
          but more skipping data has to be read. The top 2 skip levels are
          buffered in this first version of the patch and if no big skips
          can be made than this buffering is overhead compared to the
          single-level case:

          Query: +beatles +yellow +submarine
          78 total matching documents
          VInt reads: old: 38460000, new: 38685000, 0.5850234009360374%
          Time: old: 3172ms, new: 3265ms, 2.9319041614123584%

          However, if I change the query a little bit, then the speed-up
          is significant (due to the very frequent stop word "the"):

          Query: +"the beatles" +yellow +submarine
          77 total matching documents
          VInt reads: old: 382307000, new: 262331000, -31.38210914265237%
          Time: old: 26703ms, new: 22828ms, -14.51147811107366%

          It would be interesting to run more sophisticated benchmarks.
          To run the same query several times is not very realistic
          because it reduces the effects of the I/O savings due to caching.

          I'm not that familiar with the new benchmark stuff that has
          been added recently, but I'll try to dig into that next week.

          Show
          Michael Busch added a comment - I ran a few performance tests with this patch. I built a rather big index (about 1.2GB) using documents from Wikipedia and optimized the index to get big posting lists. I compare query evaluation time with one-level skipping (old) and multi-level skipping (new) and measure the amount of I/O by counting the number of VInts read from disk. Each query runs several thousand times. The following queries contain very frequent and very unique terms. For these queries the speedup with multi-level skipping is significant: Query: +lucene +search +engine +library 5 total matching documents VInt reads: old: 143268000, new: 3933000, -97.25479520897898% Time: old: 7234ms, new: 1157ms, -84.00608238871993% Query: +apache +http +server 181 total matching documents VInt reads: old: 155892000, new: 27849000, -82.13570933723346% Time: old: 10656ms, new: 5703ms, -46.48085585585586% Even though I/O is reduced for the next query, it runs a bit slower. I believe the reason is that the same query runs several thousand times, so the posting lists will be loaded into the file system cache and the effect of less I/O is reduced, while the skipping algorithm itself is a bit more complex: Query: +multi +level +skipping 13 total matching documents VInt reads: old: 42894000, new: 39096000, -8.854385228703315% Time: old: 3875ms, new: 3922ms, 1.2129032258064516% For the next query there is slightly more I/O necessary in the multi-skipping case. This is because not many big skips can be made, but more skipping data has to be read. The top 2 skip levels are buffered in this first version of the patch and if no big skips can be made than this buffering is overhead compared to the single-level case: Query: +beatles +yellow +submarine 78 total matching documents VInt reads: old: 38460000, new: 38685000, 0.5850234009360374% Time: old: 3172ms, new: 3265ms, 2.9319041614123584% However, if I change the query a little bit, then the speed-up is significant (due to the very frequent stop word "the"): Query: +"the beatles" +yellow +submarine 77 total matching documents VInt reads: old: 382307000, new: 262331000, -31.38210914265237% Time: old: 26703ms, new: 22828ms, -14.51147811107366% It would be interesting to run more sophisticated benchmarks. To run the same query several times is not very realistic because it reduces the effects of the I/O savings due to caching. I'm not that familiar with the new benchmark stuff that has been added recently, but I'll try to dig into that next week.
          Hide
          Michael Busch added a comment -

          Attaching the first version of this patch.

          Show
          Michael Busch added a comment - Attaching the first version of this patch.

            People

            • Assignee:
              Michael Busch
              Reporter:
              Michael Busch
            • Votes:
              2 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development