Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0-ALPHA
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      We should remove support for 2.x (and 1.9) indexes in 4.0. It seems that nothing can be done in 3x because there is no special code which handles 1.9, so we'll leave it there. This issue should cover:

      1. Remove the .zip indexes
      2. Remove the unnecessary code from SegmentInfo and SegmentInfos. Mike suggests we compare the version headers at the top of SegmentInfos, in 2.9.x vs 3.0.x, to see which ones can go.
      3. remove FORMAT_PRE from FieldInfos
      4. Remove old format from TermVectorsReader

      If you know of other places where code can be removed, then please post a comment here.

      I don't know when I'll have time to handle it, definitely not in the next few days. So if someone wants to take a stab at it, be my guest.

      1. LUCENE-2480.patch
        96 kB
        Shai Erera
      2. LUCENE-2480.patch
        96 kB
        Shai Erera
      3. LUCENE-2480.patch
        92 kB
        Earwin Burrfoot
      4. LUCENE-2480.patch
        52 kB
        Shai Erera

        Activity

        Hide
        Shai Erera added a comment -

        Patch removes code which supports old indexes. If there are more places in the code that you know of, please let me know.

        Show
        Shai Erera added a comment - Patch removes code which supports old indexes. If there are more places in the code that you know of, please let me know.
        Hide
        Michael McCandless added a comment -

        Great work Shai!

        Another place we can remove is the preUTF8Strings in oal.store.DataInput, and anywhere that calls setModifiedUTF8StringsMode, and I think also FieldsWriter.FORMAT_VERSION_UTF8_LENGTH_IN_BYTES?

        Show
        Michael McCandless added a comment - Great work Shai! Another place we can remove is the preUTF8Strings in oal.store.DataInput, and anywhere that calls setModifiedUTF8StringsMode, and I think also FieldsWriter.FORMAT_VERSION_UTF8_LENGTH_IN_BYTES?
        Hide
        Earwin Burrfoot added a comment -

        Here we go. Pre-utf8/compressed fields support removed, microcleanups/additions to previous patch.

        Show
        Earwin Burrfoot added a comment - Here we go. Pre-utf8/compressed fields support removed, microcleanups/additions to previous patch.
        Hide
        Shai Erera added a comment -

        Thanks Earwin ! Few comments:

        • You didn't remove the .zip indexes?
        • You removed the code from TestBackwardsCompatibilty which creates the older indexes. While this isn't needed currently for trunk - the code is commented out, and I don't think it bothers anyone - one day, after 4.0 is released, we will need to use it. So I don't think we should remove that piece of code.
        • TestIndexFileDeleter failed 'cause of deletable file - fixed it.

        Attached is the merged patch, w/ the above fixes. Solr tests fail (.dataimport. tests) for some reason. I don't understand this yet - so if someone more knowledgeable about Solr than me cares to take a look, I'd appreciate it. I see many things like that (not sure how those tests are related to this issue):

            [junit]
            [junit]  request was: q=id:500&start=0&qt=standard&version=2.2&rows=20
            [junit] junit.framework.AssertionFailedError: now it should query failed XPath: //result[@numFound=1]
            [junit]  xml response was: <?xml version="1.0" encoding="UTF-8"?>
            [junit] <response>
            [junit] <lst name="responseHeader"><int name="status">0</int><int name="QTime">16</int></lst><result name="response" numFound="0" start="0"/>
            [junit] </response>
            [junit]
            [junit]  request was: q=id:500&start=0&qt=standard&version=2.2&rows=20
            [junit]     at org.apache.solr.util.AbstractSolrTestCase.assertQ(AbstractSolrTestCase.java:236)
            [junit]     at org.apache.solr.update.AutoCommitTest.testMaxTime(AutoCommitTest.java:218)
        
        Show
        Shai Erera added a comment - Thanks Earwin ! Few comments: You didn't remove the .zip indexes? You removed the code from TestBackwardsCompatibilty which creates the older indexes. While this isn't needed currently for trunk - the code is commented out, and I don't think it bothers anyone - one day, after 4.0 is released, we will need to use it. So I don't think we should remove that piece of code. TestIndexFileDeleter failed 'cause of deletable file - fixed it. Attached is the merged patch, w/ the above fixes. Solr tests fail ( .dataimport. tests) for some reason. I don't understand this yet - so if someone more knowledgeable about Solr than me cares to take a look, I'd appreciate it. I see many things like that (not sure how those tests are related to this issue): [junit] [junit] request was: q=id:500&start=0&qt=standard&version=2.2&rows=20 [junit] junit.framework.AssertionFailedError: now it should query failed XPath: //result[@numFound=1] [junit] xml response was: <?xml version= "1.0" encoding= "UTF-8" ?> [junit] <response> [junit] <lst name= "responseHeader" >< int name= "status" >0</ int >< int name= "QTime" >16</ int ></lst><result name= "response" numFound= "0" start= "0" /> [junit] </response> [junit] [junit] request was: q=id:500&start=0&qt=standard&version=2.2&rows=20 [junit] at org.apache.solr.util.AbstractSolrTestCase.assertQ(AbstractSolrTestCase.java:236) [junit] at org.apache.solr.update.AutoCommitTest.testMaxTime(AutoCommitTest.java:218)
        Hide
        Shai Erera added a comment -
        • Removed the byte written by SegmentInfo (hasSingleNormsFile - redundant).
        • Added live migration code to SI.ctor to read that byte if pre-4.0 index
        • Renamed FORMAT_FLEX_POSTING to FORMAT_4_0. I think we should do this going forward in more places, so that it's clear to which version this format belongs and will make the task of removing support for old indexes in the future easier. The jdoc of the constant needs to describe what is covered by that format.

        Besides the Solr tests, everything passes.

        BTW, I don't think we should stop writing that byte in 3.1 - not sure if hasSingleNormsFile is always true, and even if so, 3.0 indexes still need to be supported and they write the byte. And it does not add much of complexity.

        Show
        Shai Erera added a comment - Removed the byte written by SegmentInfo (hasSingleNormsFile - redundant). Added live migration code to SI.ctor to read that byte if pre-4.0 index Renamed FORMAT_FLEX_POSTING to FORMAT_4_0. I think we should do this going forward in more places, so that it's clear to which version this format belongs and will make the task of removing support for old indexes in the future easier. The jdoc of the constant needs to describe what is covered by that format. Besides the Solr tests, everything passes. BTW, I don't think we should stop writing that byte in 3.1 - not sure if hasSingleNormsFile is always true, and even if so, 3.0 indexes still need to be supported and they write the byte. And it does not add much of complexity.
        Hide
        Shai Erera added a comment -

        Solr tests pass too ... dunno what happened, but they started to pass. And the dataimport tests do not seem they should be affected by this issue. So I'd like to commit this soon.

        Show
        Shai Erera added a comment - Solr tests pass too ... dunno what happened, but they started to pass. And the dataimport tests do not seem they should be affected by this issue. So I'd like to commit this soon.
        Hide
        Shai Erera added a comment -

        Thanks Mike and Earwin for your help !

        Committed revision 949509.

        Show
        Shai Erera added a comment - Thanks Mike and Earwin for your help ! Committed revision 949509.
        Hide
        Earwin Burrfoot added a comment -

        Wow! So fast!

        You didn't remove the .zip indexes?

        Your patch didn't remove them for me, patch file format can't handle binary stuff.

        You removed the code from TestBackwardsCompatibilty ...... So I don't think we should remove that piece of code.

        Several lines later there sits a piece of absolutely identical commented-out code. The only difference is that it doesn't use "preLockless" in method names.
        If someone really needs it later, the code I left over suits him no less than the code I removed.

        TestIndexFileDeleter failed 'cause of deletable file - fixed it.

        Cool! It's very strange I missed the failure.

        Renamed FORMAT_FLEX_POSTING to FORMAT_4_0

        FORMAT_4_0_FLEX_POSTINGS? I loved the self-descriptiveness

        And, yup. Bye-bye to the byte.

        Show
        Earwin Burrfoot added a comment - Wow! So fast! You didn't remove the .zip indexes? Your patch didn't remove them for me, patch file format can't handle binary stuff. You removed the code from TestBackwardsCompatibilty ...... So I don't think we should remove that piece of code. Several lines later there sits a piece of absolutely identical commented-out code. The only difference is that it doesn't use "preLockless" in method names. If someone really needs it later, the code I left over suits him no less than the code I removed. TestIndexFileDeleter failed 'cause of deletable file - fixed it. Cool! It's very strange I missed the failure. Renamed FORMAT_FLEX_POSTING to FORMAT_4_0 FORMAT_4_0_FLEX_POSTINGS? I loved the self-descriptiveness And, yup. Bye-bye to the byte.
        Hide
        Shai Erera added a comment -

        Your patch didn't remove them for me, patch file format can't handle binary stuff.

        Strange, there were lines in my patch file which indicated they are removed, but were absent from yours. Anyway, they're removed now.

        Several lines later there sits a piece of absolutely identical commented-out code.

        I don't see it. All I see is a comment and two methods that are used to generate the old indexes.

        FORMAT_4_0_FLEX_POSTINGS

        While I don't mind that sort of descriptiveness, it does not fit in all cases, such as this case – this format is related to both flex postings, but also to that byte removed from the SegmentInfo . So I think we should keep the names simple, and have a useful javadoc. It's the sort of constants no one really cares about in his daily work, only when you handle file format backwards stuff .

        Also, if by chance (like this case, again) you get to change the file format version twice, by two different people, with long time interval between the two, a FORMAT_4_0 should alert you that that's used for the unreleased index, and so you don't need to bump it up again (here, jumping from -9 to -11) accidentally.

        Show
        Shai Erera added a comment - Your patch didn't remove them for me, patch file format can't handle binary stuff. Strange, there were lines in my patch file which indicated they are removed, but were absent from yours. Anyway, they're removed now. Several lines later there sits a piece of absolutely identical commented-out code. I don't see it. All I see is a comment and two methods that are used to generate the old indexes. FORMAT_4_0_FLEX_POSTINGS While I don't mind that sort of descriptiveness, it does not fit in all cases, such as this case – this format is related to both flex postings, but also to that byte removed from the SegmentInfo . So I think we should keep the names simple, and have a useful javadoc. It's the sort of constants no one really cares about in his daily work, only when you handle file format backwards stuff . Also, if by chance (like this case, again) you get to change the file format version twice, by two different people, with long time interval between the two, a FORMAT_4_0 should alert you that that's used for the unreleased index, and so you don't need to bump it up again (here, jumping from -9 to -11) accidentally.
        Hide
        Earwin Burrfoot added a comment -

        Strange, there were lines in my patch file which indicated they are removed, but were absent from yours.

        Index: lucene/src/test/org/apache/lucene/index/index.19.cfs.zip
        ===================================================================
        Cannot display: file marked as a binary type.
        svn:mime-type = application/octet-stream
        Index: lucene/src/test/org/apache/lucene/index/index.19.nocfs.zip
        ===================================================================
        Cannot display: file marked as a binary type.
        svn:mime-type = application/octet-stream
        Index: lucene/src/test/org/apache/lucene/index/index.20.cfs.zip
        ===================================================================
        Cannot display: file marked as a binary type.
        svn:mime-type = application/octet-stream
        

        These lines in your patch mean that diff noticed the differences, but failed to express them - nothing more. You could modify all these files instead of deleting, and get the same result.

        I don't see it.

        Ctrl+F, "testCreateCFS"
        Ok, they're not completely identical. But I don't really care either way.

        .... FORMAT stuff .....

        Ok.

        Show
        Earwin Burrfoot added a comment - Strange, there were lines in my patch file which indicated they are removed, but were absent from yours. Index: lucene/src/test/org/apache/lucene/index/index.19.cfs.zip =================================================================== Cannot display: file marked as a binary type. svn:mime-type = application/octet-stream Index: lucene/src/test/org/apache/lucene/index/index.19.nocfs.zip =================================================================== Cannot display: file marked as a binary type. svn:mime-type = application/octet-stream Index: lucene/src/test/org/apache/lucene/index/index.20.cfs.zip =================================================================== Cannot display: file marked as a binary type. svn:mime-type = application/octet-stream These lines in your patch mean that diff noticed the differences, but failed to express them - nothing more. You could modify all these files instead of deleting, and get the same result. I don't see it. Ctrl+F, "testCreateCFS" Ok, they're not completely identical. But I don't really care either way. .... FORMAT stuff ..... Ok.
        Hide
        Uwe Schindler added a comment -

        I reopen this issue, as the following problem is there:
        When somebody tries to open an pre-3.0 index, it should fail with a meaningful exception. At least e.g. FieldsReader (where I found dead code after your removal) does not check that the version is ge the minimum version. We should add a static FORMAT_MINIMUM to all places where formats are used and check it on reading.

        Show
        Uwe Schindler added a comment - I reopen this issue, as the following problem is there: When somebody tries to open an pre-3.0 index, it should fail with a meaningful exception. At least e.g. FieldsReader (where I found dead code after your removal) does not check that the version is ge the minimum version. We should add a static FORMAT_MINIMUM to all places where formats are used and check it on reading.
        Hide
        Uwe Schindler added a comment -

        Here a first patch, that handles this in FieldsReader to show how it should be done elsewhere, too. This patch also removes some "dead" code in FieldReader, too. I left the method in, as we may need canReadRawDocs() for later format changes, too.

        Show
        Uwe Schindler added a comment - Here a first patch, that handles this in FieldsReader to show how it should be done elsewhere, too. This patch also removes some "dead" code in FieldReader, too. I left the method in, as we may need canReadRawDocs() for later format changes, too.
        Hide
        Michael McCandless added a comment -

        I already opened LUCENE-2523 for this, but it's OK if we do the work here and resolve that one as a dup.

        I think you should see an IndexTooOldException instead of CorruptIndexException (though the former could subclass the latter?)

        Show
        Michael McCandless added a comment - I already opened LUCENE-2523 for this, but it's OK if we do the work here and resolve that one as a dup. I think you should see an IndexTooOldException instead of CorruptIndexException (though the former could subclass the latter?)
        Hide
        Uwe Schindler added a comment -

        Oh, I move over to this issue. I am almost finished!

        Show
        Uwe Schindler added a comment - Oh, I move over to this issue. I am almost finished!
        Hide
        Uwe Schindler added a comment -

        Closing again.

        Show
        Uwe Schindler added a comment - Closing again.

          People

          • Assignee:
            Shai Erera
            Reporter:
            Shai Erera
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development