Lucene - Core
  1. Lucene - Core
  2. LUCENE-4443

BlockPostingsFormat writes unnecessary skipdata

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.0
    • Fix Version/s: 4.1, 5.0
    • Component/s: core/codecs
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Seems to me lastStartOffset is unnecessary, when we skip to a document, it implicitly is 0: see BlockPostingsWriter.startDoc.

      (Unless I'm missing something, all tests pass with "Block" if i remove it)

      Separately we should really think about lastPayloadByteUpto, is this worth it? instead when we actually skip, we could sum the payloadLengthBuffer from 0..curPosBufferUpto as we are going to decode that block anyway?

        Activity

        Hide
        Michael McCandless added a comment -

        +1 to nuke lastStartOffset: it's pointless!

        I'm torn on nuking lastPayloadByteUpto ...

        Show
        Michael McCandless added a comment - +1 to nuke lastStartOffset: it's pointless! I'm torn on nuking lastPayloadByteUpto ...
        Hide
        Robert Muir added a comment -

        Me too, lets leave that be. For now I committed this patch.

        I bumped the codec header version so anyone testing Block already gets a clean error.

        Show
        Robert Muir added a comment - Me too, lets leave that be. For now I committed this patch. I bumped the codec header version so anyone testing Block already gets a clean error.
        Hide
        Mark Miller added a comment -

        Shouldn't we push this into 4.0 rather than ship 4.0 and already know you have to reindex if you use this?

        Show
        Mark Miller added a comment - Shouldn't we push this into 4.0 rather than ship 4.0 and already know you have to reindex if you use this?
        Hide
        Robert Muir added a comment -

        Well this is an experimental codec (not the default format), and its likely we might change it anyway.

        I really don't like the idea of backporting optimizations to the release branch.

        Nobody has to reindex if they dont want anyway, they can use addIndexes or whatever to switch their index
        back to the lucene's official index format, upgrade to 4.1, and then switch back to the new block.

        Show
        Robert Muir added a comment - Well this is an experimental codec (not the default format), and its likely we might change it anyway. I really don't like the idea of backporting optimizations to the release branch. Nobody has to reindex if they dont want anyway, they can use addIndexes or whatever to switch their index back to the lucene's official index format, upgrade to 4.1, and then switch back to the new block.
        Hide
        Commit Tag Bot added a comment -

        [branch_4x commit] Robert Muir
        http://svn.apache.org/viewvc?view=revision&revision=1391479

        LUCENE-4443: don't write unnecessary skipdata in BlockSkipWriter

        Show
        Commit Tag Bot added a comment - [branch_4x commit] Robert Muir http://svn.apache.org/viewvc?view=revision&revision=1391479 LUCENE-4443 : don't write unnecessary skipdata in BlockSkipWriter

          People

          • Assignee:
            Unassigned
            Reporter:
            Robert Muir
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development