Lucene - Core
  1. Lucene - Core
  2. LUCENE-4248

Producers to the Codec API don't always follow the spec

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0-BETA, 5.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      We added AssertingCodec etc and have lots of tests that consumers of the codec api follow a strict set of rules: but nothing checks the producers feeding these apis (IndexWriter, codec merge implementations, etc).

      We should beef up AssertingCodec to validate these things: this way the API is being followed.

      Simple examples of things include checking that producers are feeding terms to the consumers consistent with their comparator, that they aren't providing bogus or out of band statistics, and that they are invoking the right methods consistently (e.g. not forgetting to call finishDoc or something that might confuse someones codec).

      This is also nice since now we have quite a few tests (TestCodecs, TestPostingsFormat, etc) that feed these APIs directly, it could find some test bugs.

      1. LUCENE-4248.patch
        4 kB
        Robert Muir
      2. LUCENE-4248.patch
        8 kB
        Robert Muir
      3. LUCENE-4248.patch
        9 kB
        Robert Muir
      4. LUCENE-4248.patch
        12 kB
        Robert Muir

        Activity

        Hide
        Robert Muir added a comment -

        the start to a patch: some tests still fail.

        I figure we can get everything cleaned up for postings and then if we feel like it later, add stuff for the other parts of the codec API.

        Show
        Robert Muir added a comment - the start to a patch: some tests still fail. I figure we can get everything cleaned up for postings and then if we feel like it later, add stuff for the other parts of the codec API.
        Hide
        Robert Muir added a comment -

        Updated patch: fixing some more bugs in these producers.

        I added a simple state machine as well, but because of the "startTerm without corresponding finishTerm is allowed if all docs are deleted for that term", the check is not that great right now.

        Once we add an AssertingPostingsConsumer of some sort we can actually validate no docs were added in that case and i think it will be fine...

        But I'd like to commit this for now as a start.

        Show
        Robert Muir added a comment - Updated patch: fixing some more bugs in these producers. I added a simple state machine as well, but because of the "startTerm without corresponding finishTerm is allowed if all docs are deleted for that term", the check is not that great right now. Once we add an AssertingPostingsConsumer of some sort we can actually validate no docs were added in that case and i think it will be fine... But I'd like to commit this for now as a start.
        Hide
        Robert Muir added a comment -

        one more check, and also fix a bad assert in BlockTree writer

        Show
        Robert Muir added a comment - one more check, and also fix a bad assert in BlockTree writer
        Hide
        Robert Muir added a comment -

        Here's a patch for the rest of the postings API.

        FreqProxTermsWriter was inconsistent here (depending upon when the omitTF bit got set in the indexing process).

        I added javadocs for these apis to clarify these things (freq, offsets, etc) are all -1 when they are not being indexed.

        TestCodecs didnt call finishDoc()... other than that things look good.

        Show
        Robert Muir added a comment - Here's a patch for the rest of the postings API. FreqProxTermsWriter was inconsistent here (depending upon when the omitTF bit got set in the indexing process). I added javadocs for these apis to clarify these things (freq, offsets, etc) are all -1 when they are not being indexed. TestCodecs didnt call finishDoc()... other than that things look good.

          People

          • Assignee:
            Unassigned
            Reporter:
            Robert Muir
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development