Lucene - Core
  1. Lucene - Core
  2. LUCENE-6019

IndexWriter allows to add same field with different docvlaues type

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 4.10.1
    • Fix Version/s: 4.10.2, 5.0, 6.0
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      IndexWriter checks if the DV types are consitent in multiple places but if due to some problems in Elasticsearch users where able to add the same field with different DV types causing merges to fail. Yet I was able to reduce this to a lucene testcase but I was puzzled since it always failed. Yet, I had to run it without assertions and that cause the bug to happen. I can add field foo with BINARY and SORTED_SET causing a merge to fail. Here is a gist https://gist.github.com/s1monw/8707f924b76ba40ee5f3 / https://github.com/elasticsearch/elasticsearch/issues/8009

      While this is certainly a problem in Elasticsearch Lucene also allows to corrupt an index due to user error which I think should be prevented. NOTE: this only fails if you run without assertions which I think lucene should do in CI once in a while too.

      1. LUCENE-6019.patch
        24 kB
        Michael McCandless
      2. LUCENE-6019.patch
        23 kB
        Michael McCandless

        Activity

        Hide
        Michael McCandless added a comment -

        Grrrrrr!

        DefaultIndexingChain has this nice (wrong) comment:

            // This will throw an exc if the caller tried to
            // change the DV type for the field:
            fp.fieldInfo.setDocValuesType(dvType);
            if (hasDocValues == false) {
              fieldInfos.globalFieldNumbers.setDocValuesType(fp.fieldInfo.number, fp.fieldInfo.name, dvType);
            }
        

        ... it's wrong because in FieldInfos.setDocValuesType we have this:

            synchronized void setDocValuesType(int number, String name, DocValuesType dvType) {
              assert containsConsistent(number, name, dvType);
              docValuesType.put(name, dvType);
            }
        

        So indeed it only throws exc when assertions are enabled.

        NOTE: this only fails if you run without assertions which I think lucene should do in CI once in a while too.

        +1, I think we should do this first. We have existing tests that would have caught this bug had we already done this...

        Show
        Michael McCandless added a comment - Grrrrrr! DefaultIndexingChain has this nice (wrong) comment: // This will throw an exc if the caller tried to // change the DV type for the field: fp.fieldInfo.setDocValuesType(dvType); if (hasDocValues == false) { fieldInfos.globalFieldNumbers.setDocValuesType(fp.fieldInfo.number, fp.fieldInfo.name, dvType); } ... it's wrong because in FieldInfos.setDocValuesType we have this: synchronized void setDocValuesType(int number, String name, DocValuesType dvType) { assert containsConsistent(number, name, dvType); docValuesType.put(name, dvType); } So indeed it only throws exc when assertions are enabled. NOTE: this only fails if you run without assertions which I think lucene should do in CI once in a while too. +1, I think we should do this first. We have existing tests that would have caught this bug had we already done this...
        Hide
        Michael McCandless added a comment -

        OK one correction: even with assertions disabled, no existing Lucene test fails. What's salient about Simon's test is 1) it opens a new IW, 2) the field is first added as a "not doc values" field (StringField). Then when it's added also as a DocValuesField, it follows a different code path that relies on assert.

        Anyway I'll add Simon's test here, and upgrade the assert to a real check. And I want to fix Lucene's tests to sometimes run w/o asserts.

        Show
        Michael McCandless added a comment - OK one correction: even with assertions disabled, no existing Lucene test fails. What's salient about Simon's test is 1) it opens a new IW, 2) the field is first added as a "not doc values" field (StringField). Then when it's added also as a DocValuesField, it follows a different code path that relies on assert. Anyway I'll add Simon's test here, and upgrade the assert to a real check. And I want to fix Lucene's tests to sometimes run w/o asserts.
        Hide
        Michael McCandless added a comment -

        Patch with new (at first failing) test case, and fix to throw an IllegalArgumentException when the invalid document is added, not at merge time after index is already corrupted.

        I also added some missing fail() in TestDocValuesIndexing.

        I also turned off test assertions and fixed tests that relied on them, e.g. by upgrading IndexWriter test points to run w/o asserts, and removing some test points. I put one nocommit about whether the assert we use to verify TokenStream is final should be "real" (I think not?) ...

        If the asserts fix is too much here, I can open a separate issue...

        Show
        Michael McCandless added a comment - Patch with new (at first failing) test case, and fix to throw an IllegalArgumentException when the invalid document is added, not at merge time after index is already corrupted. I also added some missing fail() in TestDocValuesIndexing. I also turned off test assertions and fixed tests that relied on them, e.g. by upgrading IndexWriter test points to run w/o asserts, and removing some test points. I put one nocommit about whether the assert we use to verify TokenStream is final should be "real" (I think not?) ... If the asserts fix is too much here, I can open a separate issue...
        Hide
        Michael McCandless added a comment -

        New patch, resolving the nocommits. I think it's ready...

        Show
        Michael McCandless added a comment - New patch, resolving the nocommits. I think it's ready...
        Hide
        Robert Muir added a comment -

        +1, patch looks awesome

        Show
        Robert Muir added a comment - +1, patch looks awesome
        Hide
        ASF subversion and git services added a comment -

        Commit 1633724 from Michael McCandless in branch 'dev/trunk'
        [ https://svn.apache.org/r1633724 ]

        LUCENE-6019: detect when DocValuesType illegally changes; add -Dtests.asserts=true|false

        Show
        ASF subversion and git services added a comment - Commit 1633724 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1633724 ] LUCENE-6019 : detect when DocValuesType illegally changes; add -Dtests.asserts=true|false
        Hide
        ASF subversion and git services added a comment -

        Commit 1633725 from Michael McCandless in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1633725 ]

        LUCENE-6019: detect when DocValuesType illegally changes; add -Dtests.asserts=true|false

        Show
        ASF subversion and git services added a comment - Commit 1633725 from Michael McCandless in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1633725 ] LUCENE-6019 : detect when DocValuesType illegally changes; add -Dtests.asserts=true|false
        Hide
        ASF subversion and git services added a comment -

        Commit 1633770 from Michael McCandless in branch 'dev/trunk'
        [ https://svn.apache.org/r1633770 ]

        LUCENE-6019: add another [failing and then fixed] test; do not set FieldInfo.docValueType when it disgrees with low-schema

        Show
        ASF subversion and git services added a comment - Commit 1633770 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1633770 ] LUCENE-6019 : add another [failing and then fixed] test; do not set FieldInfo.docValueType when it disgrees with low-schema
        Hide
        ASF subversion and git services added a comment -

        Commit 1633771 from Michael McCandless in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1633771 ]

        LUCENE-6019: add another [failing and then fixed] test; do not set FieldInfo.docValueType when it disgrees with low-schema

        Show
        ASF subversion and git services added a comment - Commit 1633771 from Michael McCandless in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1633771 ] LUCENE-6019 : add another [failing and then fixed] test; do not set FieldInfo.docValueType when it disgrees with low-schema
        Hide
        ASF subversion and git services added a comment -

        Commit 1633775 from Michael McCandless in branch 'dev/trunk'
        [ https://svn.apache.org/r1633775 ]

        LUCENE-6019: remove printStackTrace

        Show
        ASF subversion and git services added a comment - Commit 1633775 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1633775 ] LUCENE-6019 : remove printStackTrace
        Hide
        ASF subversion and git services added a comment -

        Commit 1633776 from Michael McCandless in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1633776 ]

        LUCENE-6019: remove printStackTrace

        Show
        ASF subversion and git services added a comment - Commit 1633776 from Michael McCandless in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1633776 ] LUCENE-6019 : remove printStackTrace
        Hide
        ASF subversion and git services added a comment -

        Commit 1633779 from Michael McCandless in branch 'dev/branches/lucene_solr_4_10'
        [ https://svn.apache.org/r1633779 ]

        LUCENE-6019: detect when DocValuesType illegally changes; add -Dtests.asserts=true|false

        Show
        ASF subversion and git services added a comment - Commit 1633779 from Michael McCandless in branch 'dev/branches/lucene_solr_4_10' [ https://svn.apache.org/r1633779 ] LUCENE-6019 : detect when DocValuesType illegally changes; add -Dtests.asserts=true|false
        Hide
        ASF subversion and git services added a comment -

        Commit 1633830 from Michael McCandless in branch 'dev/trunk'
        [ https://svn.apache.org/r1633830 ]

        LUCENE-6019: add more safety during DV flush

        Show
        ASF subversion and git services added a comment - Commit 1633830 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1633830 ] LUCENE-6019 : add more safety during DV flush
        Hide
        ASF subversion and git services added a comment -

        Commit 1633833 from Michael McCandless in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1633833 ]

        LUCENE-6019: add more safety during DV flush

        Show
        ASF subversion and git services added a comment - Commit 1633833 from Michael McCandless in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1633833 ] LUCENE-6019 : add more safety during DV flush
        Hide
        ASF subversion and git services added a comment -

        Commit 1633835 from Michael McCandless in branch 'dev/branches/lucene_solr_4_10'
        [ https://svn.apache.org/r1633835 ]

        LUCENE-6019: add more safety during DV flush

        Show
        ASF subversion and git services added a comment - Commit 1633835 from Michael McCandless in branch 'dev/branches/lucene_solr_4_10' [ https://svn.apache.org/r1633835 ] LUCENE-6019 : add more safety during DV flush
        Hide
        Michael McCandless added a comment -

        This commit caused LUCENE-6117, which Rob found & fixed (thanks!), but is too big to backport to 4.10.x.

        I think to fix it, I should revert the -Dtests.asserts part of this change (but keep the original bug fix).

        Show
        Michael McCandless added a comment - This commit caused LUCENE-6117 , which Rob found & fixed (thanks!), but is too big to backport to 4.10.x. I think to fix it, I should revert the -Dtests.asserts part of this change (but keep the original bug fix).
        Hide
        Robert Muir added a comment -

        +1, i would rather not cause instability or false failures in the bugfix branch.

        Show
        Robert Muir added a comment - +1, i would rather not cause instability or false failures in the bugfix branch.
        Hide
        ASF subversion and git services added a comment -

        Commit 1646416 from Michael McCandless in branch 'dev/branches/lucene_solr_4_10'
        [ https://svn.apache.org/r1646416 ]

        LUCENE-6019, LUCENE-6117: remove -Dtests.assert: this is too big a change for a bug-fix release (and it introduced a bug)

        Show
        ASF subversion and git services added a comment - Commit 1646416 from Michael McCandless in branch 'dev/branches/lucene_solr_4_10' [ https://svn.apache.org/r1646416 ] LUCENE-6019 , LUCENE-6117 : remove -Dtests.assert: this is too big a change for a bug-fix release (and it introduced a bug)
        Hide
        Anshum Gupta added a comment -

        Bulk close after 5.0 release.

        Show
        Anshum Gupta added a comment - Bulk close after 5.0 release.

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Simon Willnauer
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development