Lucene - Core
  1. Lucene - Core
  2. LUCENE-6207

Multiple filtered subsets of the same underlying index passed to IW.addIndexes() can produce an index with bad SortedDocValues

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 4.9, 4.9.1, 4.10, 4.10.1, 4.10.2, 4.10.3
    • Fix Version/s: 4.10.4, 5.0, 6.0
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Were hit by this in a custom index splitter implementation that showed no problems with Lucene 4.8. After upgrading to 4.10 documents started having wrong SortedDocValues after splitting.

      1. LUCENE-6207.patch
        11 kB
        Adrien Grand
      2. LUCENE-6207.patch
        11 kB
        Adrien Grand
      3. Lucene6207Test.java
        5 kB
        Robert Muir
      4. Lucene6207Test.java
        5 kB
        TomShally

        Activity

        Hide
        TomShally added a comment -

        Test showing the issue

        Show
        TomShally added a comment - Test showing the issue
        Hide
        Robert Muir added a comment -

        Thanks for taking the time to write the test.

        Attached is your same test, just ported to trunk (using a slow-splitting way that forces it to still fail, as the bug still exists).

        This is a byte[] reuse bug caused from LUCENE-5703. If you run the test with the revision before that commit it will pass, it fails after that commit.

        Show
        Robert Muir added a comment - Thanks for taking the time to write the test. Attached is your same test, just ported to trunk (using a slow-splitting way that forces it to still fail, as the bug still exists). This is a byte[] reuse bug caused from LUCENE-5703 . If you run the test with the revision before that commit it will pass, it fails after that commit.
        Hide
        Adrien Grand added a comment -

        The bug is that doc values terms enums assume that nothing else is going to use the underlying dv instance at the same time. Here is a patch that makes the attached test case pass. I also added tests to the base dv format test case and to the Lucene50 one to test the compressed terms dict.

        Show
        Adrien Grand added a comment - The bug is that doc values terms enums assume that nothing else is going to use the underlying dv instance at the same time. Here is a patch that makes the attached test case pass. I also added tests to the base dv format test case and to the Lucene50 one to test the compressed terms dict.
        Hide
        Adrien Grand added a comment -

        Same patch, I just added an entry in the changes.txt and fixed the test to remove an abusive "assumeTrue". For now the entry is under 5.1 but I'm wondering if we should try to put in in 5.0?

        Show
        Adrien Grand added a comment - Same patch, I just added an entry in the changes.txt and fixed the test to remove an abusive "assumeTrue". For now the entry is under 5.1 but I'm wondering if we should try to put in in 5.0?
        Hide
        Robert Muir added a comment -

        +1!

        Show
        Robert Muir added a comment - +1!
        Hide
        Michael McCandless added a comment -

        +1 for patch and +1 to fix this for 5.0.0

        Show
        Michael McCandless added a comment - +1 for patch and +1 to fix this for 5.0.0
        Hide
        ASF subversion and git services added a comment -

        Commit 1655693 from Adrien Grand in branch 'dev/trunk'
        [ https://svn.apache.org/r1655693 ]

        LUCENE-6207: Fixed consumption of several terms enums on the same sorted (set) doc values instance.

        Show
        ASF subversion and git services added a comment - Commit 1655693 from Adrien Grand in branch 'dev/trunk' [ https://svn.apache.org/r1655693 ] LUCENE-6207 : Fixed consumption of several terms enums on the same sorted (set) doc values instance.
        Hide
        ASF subversion and git services added a comment -

        Commit 1655705 from Adrien Grand in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1655705 ]

        LUCENE-6207: Fixed consumption of several terms enums on the same sorted (set) doc values instance.

        Show
        ASF subversion and git services added a comment - Commit 1655705 from Adrien Grand in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1655705 ] LUCENE-6207 : Fixed consumption of several terms enums on the same sorted (set) doc values instance.
        Hide
        ASF subversion and git services added a comment -

        Commit 1655710 from Adrien Grand in branch 'dev/branches/lucene_solr_5_0'
        [ https://svn.apache.org/r1655710 ]

        LUCENE-6207: Fixed consumption of several terms enums on the same sorted (set) doc values instance.

        Show
        ASF subversion and git services added a comment - Commit 1655710 from Adrien Grand in branch 'dev/branches/lucene_solr_5_0' [ https://svn.apache.org/r1655710 ] LUCENE-6207 : Fixed consumption of several terms enums on the same sorted (set) doc values instance.
        Hide
        ASF subversion and git services added a comment -

        Commit 1655718 from Adrien Grand in branch 'dev/branches/lucene_solr_4_10'
        [ https://svn.apache.org/r1655718 ]

        LUCENE-6207: Fixed consumption of several terms enums on the same sorted (set) doc values instance.

        Show
        ASF subversion and git services added a comment - Commit 1655718 from Adrien Grand in branch 'dev/branches/lucene_solr_4_10' [ https://svn.apache.org/r1655718 ] LUCENE-6207 : Fixed consumption of several terms enums on the same sorted (set) doc values instance.
        Hide
        Adrien Grand added a comment -

        Thanks Tom for the patch, it was really helpful to dig this issue!

        Show
        Adrien Grand added a comment - Thanks Tom for the patch, it was really helpful to dig this issue!
        Hide
        Anshum Gupta added a comment -

        Bulk close after 5.0 release.

        Show
        Anshum Gupta added a comment - Bulk close after 5.0 release.

          People

          • Assignee:
            Adrien Grand
            Reporter:
            TomShally
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development