Lucene - Core
  1. Lucene - Core
  2. LUCENE-5934

4.10 broke backwards compatibility for 4.0 beta & 4.0-release indexes

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 4.10
    • Fix Version/s: 4.10.1, 5.0, 6.0
    • Component/s: core/codecs, core/index
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      As reported by Ian on the user list:

      Its trying to treat them as 3.x

      1. LUCENE-5934.patch
        13 kB
        Uwe Schindler
      2. LUCENE-5934.patch
        11 kB
        Uwe Schindler
      3. LUCENE-5934.patch
        9 kB
        Uwe Schindler
      4. LUCENE-5934.patch
        8 kB
        Uwe Schindler
      5. LUCENE-5934.patch
        4 kB
        Uwe Schindler

        Issue Links

          Activity

          Hide
          Uwe Schindler added a comment -

          From user list, my final analysis:

          Hi,
          
          we looked into earlier releases:
          
          The index version number of 4.0-ALPHA was "4.0"
          The index version number of 4.0-BETA was "4.0.0.1"
          The index version number of 4.0 final was "4.0.0.2"
          
          Ian's index is there fore a real official 4.0 index.
          
          Unfortunately the version comparison logic in Lucene 4.10 is wrong, as it has a special case for ALPHA and BETA indexes, which does not fit reality. Also, the constants are wrong:
          
            /**
             * Match settings and bugs in Lucene's 4.0.0-ALPHA release.
             * @deprecated (4.1) Use latest
             */
            @Deprecated
            public static final Version LUCENE_4_0_0_ALPHA = new Version(4, 0, 0, 1);
          
            /**
             * Match settings and bugs in Lucene's 4.0.0-BETA release.
             * @deprecated (4.1) Use latest
             */
            @Deprecated
            public static final Version LUCENE_4_0_0_BETA = new Version(4, 0, 0, 2);
          
            /**
             * Match settings and bugs in Lucene's 4.0.0 release.
             * @deprecated (4.1) Use latest
             */
            @Deprecated
            public static final Version LUCENE_4_0_0 = new Version(4, 0, 0);
          
          Because of this and the special case, 4.0.0.2 orders before "4.0.0" (see encodedVersionNumber). This causes IndexReader/IndexWriter to think it was created in 3.x.
          
          TestBackwards compatibility did not find that bug, because the backwards index in the tests directory was created with the Alpha version :(
          
          Uwe
          
          -----
          Uwe Schindler
          H.-H.-Meier-Allee 63, D-28213 Bremen
          http://www.thetaphi.de
          eMail: uwe@thetaphi.de
          
          
          > -----Original Message-----
          > From: Uwe Schindler [mailto:uwe@thetaphi.de]
          > Sent: Wednesday, September 10, 2014 1:42 PM
          > To: java-user@lucene.apache.org
          > Subject: RE: 4.10.0: java.lang.IllegalStateException: cannot write 3x 
          > SegmentInfo unless codec is Lucene3x (got: Lucene40)
          > 
          > Hi Ian,
          > 
          > this index was created with the BETA version of Lucene 4.0:
          > 
          > Segments file=segments_2 numSegments=1 version=4.0.0.2 format=
          >   1 of 1: name=_0 docCount=15730
          > 
          > "4.0.0.2" was the index version number of Lucene 4.0-BETA. This is not 
          > a supported version and may not open correctly. In Lucene 4.10 we 
          > changed version handling and parsing version numbers a bit, so this 
          > may be the cause for the error.
          > 
          > Uwe
          > 
          > -----
          > Uwe Schindler
          > H.-H.-Meier-Allee 63, D-28213 Bremen
          > http://www.thetaphi.de
          > eMail: uwe@thetaphi.de
          > 
          > 
          > > -----Original Message-----
          > > From: Ian Lea [mailto:ian.lea@gmail.com]
          > > Sent: Wednesday, September 10, 2014 1:01 PM
          > > To: java-user@lucene.apache.org
          > > Subject: 4.10.0: java.lang.IllegalStateException: cannot write 3x 
          > > SegmentInfo unless codec is Lucene3x (got: Lucene40)
          > >
          > > Hi
          > >
          > >
          > > On running a quick test after a handful of minor code changes to 
          > > deal with
          > > 4.10 deprecations, a program that updates an existing index failed 
          > > with
          > >
          > > Exception in thread "main" java.lang.IllegalStateException: cannot 
          > > write 3x SegmentInfo unless codec is Lucene3x (got: Lucene40) at
          > >
          > org.apache.lucene.index.SegmentInfos.write3xInfo(SegmentInfos.java:607
          > > )
          > >
          > > and along the way did something to the index to make it unusable.
          > >
          > > Digging a bit deeper and working on a different old test index that 
          > > was lying around, and taking a backup first this time, this is reproducible.
          > >
          > > The working index:
          > >
          > > total 1036
          > > -rw-r--r-- 1 tril users 165291 Jan 18  2013 _0.fdt
          > > -rw-r--r-- 1 tril users 125874 Jan 18  2013 _0.fdx
          > > -rw-r--r-- 1 tril users   1119 Jan 18  2013 _0.fnm
          > > -rw-r--r-- 1 tril users 378015 Jan 18  2013 _0_Lucene40_0.frq
          > > -rw-r--r-- 1 tril users 350628 Jan 18  2013 _0_Lucene40_0.tim
          > > -rw-r--r-- 1 tril users  13988 Jan 18  2013 _0_Lucene40_0.tip
          > > -rw-r--r-- 1 tril users    311 Jan 18  2013 _0.si
          > > -rw-r--r-- 1 tril users     69 Jan 18  2013 segments_2
          > > -rw-r--r-- 1 tril users     20 Jan 18  2013 segments.gen
          > >
          > > and output from 4.10 CheckIndex
          > >
          > > Opening index @ index/
          > >
          > > Segments file=segments_2 numSegments=1 version=4.0.0.2 format=
          > >   1 of 1: name=_0 docCount=15730
          > >     version=4.0.0.2
          > >     codec=Lucene40
          > >     compound=false
          > >     numFiles=7
          > >     size (MB)=0.987
          > >     diagnostics = {os=Linux, os.version=3.1.0-1.2-desktop, 
          > > source=flush,
          > > lucene.version=4.0.0 1394950 - rmuir - 2012-10-06 02:58:12, 
          > > os.arch=amd64, java.version=1.7.0_10, java.vendor=Oracle Corporation}
          > >     no deletions
          > >     test: open reader.........OK
          > >     test: check integrity.....OK
          > >     test: check live docs.....OK
          > >     test: fields..............OK [13 fields]
          > >     test: field norms.........OK [0 fields]
          > >     test: terms, freq, prox...OK [53466 terms; 217447 terms/docs 
          > > pairs; 139382 tokens]
          > >     test: stored fields.......OK [15730 total field count; avg 1 fields per doc]
          > >     test: term vectors........OK [0 total vector count; avg 0 
          > > term/freq vector fields per doc]
          > >     test: docvalues...........OK [0 docvalues fields; 0 BINARY; 0 
          > > NUMERIC; 0 SORTED; 0 SORTED_NUMERIC; 0 SORTED_SET]
          > >
          > > No problems were detected with this index.
          > >
          > >
          > > Now run this little program
          > >
          > >     public static void main(final String[] _args) throws Exception { 
          > > File index = new File(_args[0]); IndexWriterConfig iwcfg = new 
          > > IndexWriterConfig(Version.LUCENE_4_10_0,
          > > new StandardAnalyzer());
          > >
          > iwcfg.setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND);
          > > Directory d = FSDirectory.open(index, new 
          > > SimpleFSLockFactory(index)); IndexWriter iw = new IndexWriter(d, 
          > > iwcfg); Document doc1 = new Document(); doc1.add(new 
          > > StringField("type", "test", Field.Store.NO)); iw.addDocument(doc1); iw.close();
          > >     }
          > >
          > > and it fails with
          > >
          > > Exception in thread "main" java.lang.IllegalStateException: cannot 
          > > write 3x SegmentInfo unless codec is Lucene3x (got: Lucene40) at
          > >
          > org.apache.lucene.index.SegmentInfos.write3xInfo(SegmentInfos.java:607
          > > ) at 
          > > org.apache.lucene.index.SegmentInfos.write(SegmentInfos.java:524)
          > > at
          > >
          > org.apache.lucene.index.SegmentInfos.prepareCommit(SegmentInfos.java:
          > > 1017)
          > > at
          > > org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:454
          > > 9)
          > > at
          > >
          > org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.
          > > j
          > > ava:3062)
          > > at
          > > org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:
          > > 31
          > > 69
          > > )
          > > at 
          > > org.apache.lucene.index.IndexWriter.shutdown(IndexWriter.java:915)
          > > at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:986)
          > > at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:956)
          > > at t.main(t.java:25)
          > >
          > > and when run CheckIndex again get
          > >
          > >
          > > Opening index @ index/
          > >
          > > ERROR: could not read any segments file in directory
          > > java.nio.file.NoSuchFileException: /tmp/lucene/index/_0.si at
          > > sun.nio.fs.UnixException.translateToIOException(UnixException.java:8
          > > 6)
          > > at
          > > sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102
          > > )
          > > at
          > > sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107
          > > )
          > > at
          > >
          > sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.
          > > java:176)
          > > at java.nio.channels.FileChannel.open(FileChannel.java:287)
          > > at java.nio.channels.FileChannel.open(FileChannel.java:334)
          > > at
          > >
          > org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:19
          > > 6)
          > > at
          > >
          > org.apache.lucene.codecs.lucene40.Lucene40SegmentInfoReader.read(Luce
          > > ne40SegmentInfoReader.java:52)
          > > at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:362)
          > > at
          > > org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:458)
          > > at
          > >
          > org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfo
          > > s.java:913)
          > > at
          > >
          > org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfo
          > > s.java:759)
          > > at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:454)
          > > at 
          > > org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:414)
          > > at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2096)
          > >
          > > which is true
          > >
          > > total 1032
          > > -rw-r--r-- 1 tril users 165291 Jan 18  2013 _0.fdt
          > > -rw-r--r-- 1 tril users 125874 Jan 18  2013 _0.fdx
          > > -rw-r--r-- 1 tril users   1119 Jan 18  2013 _0.fnm
          > > -rw-r--r-- 1 tril users 378015 Jan 18  2013 _0_Lucene40_0.frq
          > > -rw-r--r-- 1 tril users 350628 Jan 18  2013 _0_Lucene40_0.tim
          > > -rw-r--r-- 1 tril users  13988 Jan 18  2013 _0_Lucene40_0.tip
          > > -rw-r--r-- 1 tril users     69 Jan 18  2013 segments_2
          > > -rw-r--r-- 1 tril users     20 Jan 18  2013 segments.gen
          > >
          > >
          > > I don't recall the origins of this index but it may well have been 
          > > created in the distant past and been upgraded, explicitly or 
          > > automatically,
          > along the way.
          > > Although evidently not for a while.
          > >
          > >
          > > Running the same test with lucene 4.9.0 and minimal mods to the 
          > > program runs to successful completion.  Here's the CheckIndex output:
          > >
          > >
          > > Opening index @ index-4.9.updated/
          > >
          > > Segments file=segments_3 numSegments=2 versions=[4.0.0.2 .. 4.9]
          > format=
          > >   1 of 2: name=_0 docCount=15730
          > >     codec=Lucene40
          > >     compound=false
          > >     numFiles=7
          > >     size (MB)=0.987
          > >     diagnostics = {os=Linux, os.version=3.1.0-1.2-desktop, 
          > > source=flush,
          > > lucene.version=4.0.0 1394950 - rmuir - 2012-10-06 02:58:12, 
          > > os.arch=amd64, java.version=1.7.0_10, java.vendor=Oracle Corporation}
          > >     no deletions
          > >     test: open reader.........OK
          > >     test: check integrity.....OK
          > >     test: check live docs.....OK
          > >     test: fields..............OK [13 fields]
          > >     test: field norms.........OK [0 fields]
          > >     test: terms, freq, prox...OK [53466 terms; 217447 terms/docs 
          > > pairs; 139382 tokens]
          > >     test: stored fields.......OK [15730 total field count; avg 1 fields per doc]
          > >     test: term vectors........OK [0 total vector count; avg 0 
          > > term/freq vector fields per doc]
          > >     test: docvalues...........OK [0 docvalues fields; 0 BINARY; 0 
          > > NUMERIC; 0 SORTED; 0 SORTED_NUMERIC; 0 SORTED_SET]
          > >
          > >   2 of 2: name=_1 docCount=1
          > >     codec=Lucene49
          > >     compound=true
          > >     numFiles=3
          > >     size (MB)=0.001
          > >     diagnostics = {timestamp=1410281698360, os=Linux,
          > > os.version=3.1.0-1.2- desktop, source=flush, lucene.version=4.9.0
          > > 1604085 - rmuir - 2014-06-20 06:22:23, os.arch=amd64, 
          > > java.version=1.7.0_10, java.vendor=Oracle Corporation}
          > >     no deletions
          > >     test: open reader.........OK
          > >     test: check integrity.....OK
          > >     test: check live docs.....OK
          > >     test: fields..............OK [1 fields]
          > >     test: field norms.........OK [0 fields]
          > >     test: terms, freq, prox...OK [1 terms; 1 terms/docs pairs; 0 tokens]
          > >     test: stored fields.......OK [0 total field count; avg 0 fields per doc]
          > >     test: term vectors........OK [0 total vector count; avg 0 
          > > term/freq vector fields per doc]
          > >     test: docvalues...........OK [0 docvalues fields; 0 BINARY; 0 
          > > NUMERIC; 0 SORTED; 0 SORTED_NUMERIC; 0 SORTED_SET]
          > >
          > > No problems were detected with this index.
          > >
          > >
          > >
          > >
          > > --
          > > Ian.
          
          Show
          Uwe Schindler added a comment - From user list, my final analysis: Hi, we looked into earlier releases: The index version number of 4.0-ALPHA was "4.0" The index version number of 4.0-BETA was "4.0.0.1" The index version number of 4.0 final was "4.0.0.2" Ian's index is there fore a real official 4.0 index. Unfortunately the version comparison logic in Lucene 4.10 is wrong, as it has a special case for ALPHA and BETA indexes, which does not fit reality. Also, the constants are wrong: /** * Match settings and bugs in Lucene's 4.0.0-ALPHA release. * @deprecated (4.1) Use latest */ @Deprecated public static final Version LUCENE_4_0_0_ALPHA = new Version(4, 0, 0, 1); /** * Match settings and bugs in Lucene's 4.0.0-BETA release. * @deprecated (4.1) Use latest */ @Deprecated public static final Version LUCENE_4_0_0_BETA = new Version(4, 0, 0, 2); /** * Match settings and bugs in Lucene's 4.0.0 release. * @deprecated (4.1) Use latest */ @Deprecated public static final Version LUCENE_4_0_0 = new Version(4, 0, 0); Because of this and the special case, 4.0.0.2 orders before "4.0.0" (see encodedVersionNumber). This causes IndexReader/IndexWriter to think it was created in 3.x. TestBackwards compatibility did not find that bug, because the backwards index in the tests directory was created with the Alpha version :( Uwe ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: uwe@thetaphi.de > -----Original Message----- > From: Uwe Schindler [mailto:uwe@thetaphi.de] > Sent: Wednesday, September 10, 2014 1:42 PM > To: java-user@lucene.apache.org > Subject: RE: 4.10.0: java.lang.IllegalStateException: cannot write 3x > SegmentInfo unless codec is Lucene3x (got: Lucene40) > > Hi Ian, > > this index was created with the BETA version of Lucene 4.0: > > Segments file=segments_2 numSegments=1 version=4.0.0.2 format= > 1 of 1: name=_0 docCount=15730 > > "4.0.0.2" was the index version number of Lucene 4.0-BETA. This is not > a supported version and may not open correctly. In Lucene 4.10 we > changed version handling and parsing version numbers a bit, so this > may be the cause for the error. > > Uwe > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: uwe@thetaphi.de > > > > -----Original Message----- > > From: Ian Lea [mailto:ian.lea@gmail.com] > > Sent: Wednesday, September 10, 2014 1:01 PM > > To: java-user@lucene.apache.org > > Subject: 4.10.0: java.lang.IllegalStateException: cannot write 3x > > SegmentInfo unless codec is Lucene3x (got: Lucene40) > > > > Hi > > > > > > On running a quick test after a handful of minor code changes to > > deal with > > 4.10 deprecations, a program that updates an existing index failed > > with > > > > Exception in thread "main" java.lang.IllegalStateException: cannot > > write 3x SegmentInfo unless codec is Lucene3x (got: Lucene40) at > > > org.apache.lucene.index.SegmentInfos.write3xInfo(SegmentInfos.java:607 > > ) > > > > and along the way did something to the index to make it unusable. > > > > Digging a bit deeper and working on a different old test index that > > was lying around, and taking a backup first this time, this is reproducible. > > > > The working index: > > > > total 1036 > > -rw-r--r-- 1 tril users 165291 Jan 18 2013 _0.fdt > > -rw-r--r-- 1 tril users 125874 Jan 18 2013 _0.fdx > > -rw-r--r-- 1 tril users 1119 Jan 18 2013 _0.fnm > > -rw-r--r-- 1 tril users 378015 Jan 18 2013 _0_Lucene40_0.frq > > -rw-r--r-- 1 tril users 350628 Jan 18 2013 _0_Lucene40_0.tim > > -rw-r--r-- 1 tril users 13988 Jan 18 2013 _0_Lucene40_0.tip > > -rw-r--r-- 1 tril users 311 Jan 18 2013 _0.si > > -rw-r--r-- 1 tril users 69 Jan 18 2013 segments_2 > > -rw-r--r-- 1 tril users 20 Jan 18 2013 segments.gen > > > > and output from 4.10 CheckIndex > > > > Opening index @ index/ > > > > Segments file=segments_2 numSegments=1 version=4.0.0.2 format= > > 1 of 1: name=_0 docCount=15730 > > version=4.0.0.2 > > codec=Lucene40 > > compound=false > > numFiles=7 > > size (MB)=0.987 > > diagnostics = {os=Linux, os.version=3.1.0-1.2-desktop, > > source=flush, > > lucene.version=4.0.0 1394950 - rmuir - 2012-10-06 02:58:12, > > os.arch=amd64, java.version=1.7.0_10, java.vendor=Oracle Corporation} > > no deletions > > test: open reader.........OK > > test: check integrity.....OK > > test: check live docs.....OK > > test: fields..............OK [13 fields] > > test: field norms.........OK [0 fields] > > test: terms, freq, prox...OK [53466 terms; 217447 terms/docs > > pairs; 139382 tokens] > > test: stored fields.......OK [15730 total field count; avg 1 fields per doc] > > test: term vectors........OK [0 total vector count; avg 0 > > term/freq vector fields per doc] > > test: docvalues...........OK [0 docvalues fields; 0 BINARY; 0 > > NUMERIC; 0 SORTED; 0 SORTED_NUMERIC; 0 SORTED_SET] > > > > No problems were detected with this index. > > > > > > Now run this little program > > > > public static void main(final String[] _args) throws Exception { > > File index = new File(_args[0]); IndexWriterConfig iwcfg = new > > IndexWriterConfig(Version.LUCENE_4_10_0, > > new StandardAnalyzer()); > > > iwcfg.setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND); > > Directory d = FSDirectory.open(index, new > > SimpleFSLockFactory(index)); IndexWriter iw = new IndexWriter(d, > > iwcfg); Document doc1 = new Document(); doc1.add(new > > StringField("type", "test", Field.Store.NO)); iw.addDocument(doc1); iw.close(); > > } > > > > and it fails with > > > > Exception in thread "main" java.lang.IllegalStateException: cannot > > write 3x SegmentInfo unless codec is Lucene3x (got: Lucene40) at > > > org.apache.lucene.index.SegmentInfos.write3xInfo(SegmentInfos.java:607 > > ) at > > org.apache.lucene.index.SegmentInfos.write(SegmentInfos.java:524) > > at > > > org.apache.lucene.index.SegmentInfos.prepareCommit(SegmentInfos.java: > > 1017) > > at > > org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:454 > > 9) > > at > > > org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter. > > j > > ava:3062) > > at > > org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java: > > 31 > > 69 > > ) > > at > > org.apache.lucene.index.IndexWriter.shutdown(IndexWriter.java:915) > > at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:986) > > at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:956) > > at t.main(t.java:25) > > > > and when run CheckIndex again get > > > > > > Opening index @ index/ > > > > ERROR: could not read any segments file in directory > > java.nio.file.NoSuchFileException: /tmp/lucene/index/_0.si at > > sun.nio.fs.UnixException.translateToIOException(UnixException.java:8 > > 6) > > at > > sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102 > > ) > > at > > sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107 > > ) > > at > > > sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider. > > java:176) > > at java.nio.channels.FileChannel.open(FileChannel.java:287) > > at java.nio.channels.FileChannel.open(FileChannel.java:334) > > at > > > org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:19 > > 6) > > at > > > org.apache.lucene.codecs.lucene40.Lucene40SegmentInfoReader.read(Luce > > ne40SegmentInfoReader.java:52) > > at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:362) > > at > > org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:458) > > at > > > org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfo > > s.java:913) > > at > > > org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfo > > s.java:759) > > at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:454) > > at > > org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:414) > > at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2096) > > > > which is true > > > > total 1032 > > -rw-r--r-- 1 tril users 165291 Jan 18 2013 _0.fdt > > -rw-r--r-- 1 tril users 125874 Jan 18 2013 _0.fdx > > -rw-r--r-- 1 tril users 1119 Jan 18 2013 _0.fnm > > -rw-r--r-- 1 tril users 378015 Jan 18 2013 _0_Lucene40_0.frq > > -rw-r--r-- 1 tril users 350628 Jan 18 2013 _0_Lucene40_0.tim > > -rw-r--r-- 1 tril users 13988 Jan 18 2013 _0_Lucene40_0.tip > > -rw-r--r-- 1 tril users 69 Jan 18 2013 segments_2 > > -rw-r--r-- 1 tril users 20 Jan 18 2013 segments.gen > > > > > > I don't recall the origins of this index but it may well have been > > created in the distant past and been upgraded, explicitly or > > automatically, > along the way. > > Although evidently not for a while. > > > > > > Running the same test with lucene 4.9.0 and minimal mods to the > > program runs to successful completion. Here's the CheckIndex output: > > > > > > Opening index @ index-4.9.updated/ > > > > Segments file=segments_3 numSegments=2 versions=[4.0.0.2 .. 4.9] > format= > > 1 of 2: name=_0 docCount=15730 > > codec=Lucene40 > > compound=false > > numFiles=7 > > size (MB)=0.987 > > diagnostics = {os=Linux, os.version=3.1.0-1.2-desktop, > > source=flush, > > lucene.version=4.0.0 1394950 - rmuir - 2012-10-06 02:58:12, > > os.arch=amd64, java.version=1.7.0_10, java.vendor=Oracle Corporation} > > no deletions > > test: open reader.........OK > > test: check integrity.....OK > > test: check live docs.....OK > > test: fields..............OK [13 fields] > > test: field norms.........OK [0 fields] > > test: terms, freq, prox...OK [53466 terms; 217447 terms/docs > > pairs; 139382 tokens] > > test: stored fields.......OK [15730 total field count; avg 1 fields per doc] > > test: term vectors........OK [0 total vector count; avg 0 > > term/freq vector fields per doc] > > test: docvalues...........OK [0 docvalues fields; 0 BINARY; 0 > > NUMERIC; 0 SORTED; 0 SORTED_NUMERIC; 0 SORTED_SET] > > > > 2 of 2: name=_1 docCount=1 > > codec=Lucene49 > > compound=true > > numFiles=3 > > size (MB)=0.001 > > diagnostics = {timestamp=1410281698360, os=Linux, > > os.version=3.1.0-1.2- desktop, source=flush, lucene.version=4.9.0 > > 1604085 - rmuir - 2014-06-20 06:22:23, os.arch=amd64, > > java.version=1.7.0_10, java.vendor=Oracle Corporation} > > no deletions > > test: open reader.........OK > > test: check integrity.....OK > > test: check live docs.....OK > > test: fields..............OK [1 fields] > > test: field norms.........OK [0 fields] > > test: terms, freq, prox...OK [1 terms; 1 terms/docs pairs; 0 tokens] > > test: stored fields.......OK [0 total field count; avg 0 fields per doc] > > test: term vectors........OK [0 total vector count; avg 0 > > term/freq vector fields per doc] > > test: docvalues...........OK [0 docvalues fields; 0 BINARY; 0 > > NUMERIC; 0 SORTED; 0 SORTED_NUMERIC; 0 SORTED_SET] > > > > No problems were detected with this index. > > > > > > > > > > -- > > Ian.
          Hide
          Uwe Schindler added a comment - - edited

          Patch for 4.x that fixes the version handling.

          I am not really happy, because Version.parse("4.0.0") now parses to 4.0.0-ALPHA, but this is how it was defined and written to index.

          I have to check the codecs now, that they correctly detect 4.0 ALPHA/BETA indexes (e.g., use the right version constant on comparing).

          Another thing, theoretically, analyzers must use LUCENE_4_0_0_ALPHA to compare.

          Another idea I had would be to add a method Version.isMinimumLucene4().

          Show
          Uwe Schindler added a comment - - edited Patch for 4.x that fixes the version handling. I am not really happy, because Version.parse("4.0.0") now parses to 4.0.0-ALPHA, but this is how it was defined and written to index. I have to check the codecs now, that they correctly detect 4.0 ALPHA/BETA indexes (e.g., use the right version constant on comparing). Another thing, theoretically, analyzers must use LUCENE_4_0_0_ALPHA to compare. Another idea I had would be to add a method Version.isMinimumLucene4() .
          Hide
          Robert Muir added a comment -

          What happened with the safety check?

          The user hit it, which is nice:

          java.lang.IllegalStateException: cannot
          write 3x SegmentInfo unless codec is Lucene3x (got: Lucene40)
          at org.apache.lucene.index.SegmentInfos.write3xInfo(SegmentInfos.java:607)
          

          Can we move it before we do anything destructive for additional safety? or does it not matter and due to fallback logic in commit...

          Show
          Robert Muir added a comment - What happened with the safety check? The user hit it, which is nice: java.lang.IllegalStateException: cannot write 3x SegmentInfo unless codec is Lucene3x (got: Lucene40) at org.apache.lucene.index.SegmentInfos.write3xInfo(SegmentInfos.java:607) Can we move it before we do anything destructive for additional safety? or does it not matter and due to fallback logic in commit...
          Hide
          Michael McCandless added a comment -

          I built a 4.0.0 index and copied forward to 4.x's TestBackwardsCompatibility, and the test fails with this:

          java.io.IOException: file _0.si already exists
          	at __randomizedtesting.SeedInfo.seed([DC4E59ED2DE3E60F:6D2FC24DAADD029A]:0)
          	at org.apache.lucene.store.MockDirectoryWrapper.createOutput(MockDirectoryWrapper.java:552)
          	at org.apache.lucene.index.SegmentInfos.write3xInfo(SegmentInfos.java:610)
          	at org.apache.lucene.index.SegmentInfos.write(SegmentInfos.java:535)
          	at org.apache.lucene.index.SegmentInfos.prepareCommit(SegmentInfos.java:1033)
          	at org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:4560)
          	at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:3067)
          	at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3174)
          	at org.apache.lucene.index.IndexWriter.shutdown(IndexWriter.java:916)
          	at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:987)
          	at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:957)
          

          It's slightly different than what Ian saw because MDW catches earlier that something is wrong since we are supposed to be write-once.

          But without MDW, we would create the output (overwriting the correct one from the index), then inside a try/finally we throw that IllegalStateExc, then in the finally clause we close that .si file and delete it. So I suspect exactly one .si file is deleted from Ian's index, causing the corruption.

          Can we move it before we do anything destructive for additional safety?

          We should definitely move up this check, after adding the missing indices to TestBackCompat.

          Show
          Michael McCandless added a comment - I built a 4.0.0 index and copied forward to 4.x's TestBackwardsCompatibility, and the test fails with this: java.io.IOException: file _0.si already exists at __randomizedtesting.SeedInfo.seed([DC4E59ED2DE3E60F:6D2FC24DAADD029A]:0) at org.apache.lucene.store.MockDirectoryWrapper.createOutput(MockDirectoryWrapper.java:552) at org.apache.lucene.index.SegmentInfos.write3xInfo(SegmentInfos.java:610) at org.apache.lucene.index.SegmentInfos.write(SegmentInfos.java:535) at org.apache.lucene.index.SegmentInfos.prepareCommit(SegmentInfos.java:1033) at org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:4560) at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:3067) at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3174) at org.apache.lucene.index.IndexWriter.shutdown(IndexWriter.java:916) at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:987) at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:957) It's slightly different than what Ian saw because MDW catches earlier that something is wrong since we are supposed to be write-once. But without MDW, we would create the output (overwriting the correct one from the index), then inside a try/finally we throw that IllegalStateExc, then in the finally clause we close that .si file and delete it. So I suspect exactly one .si file is deleted from Ian's index, causing the corruption. Can we move it before we do anything destructive for additional safety? We should definitely move up this check, after adding the missing indices to TestBackCompat.
          Hide
          Uwe Schindler added a comment - - edited

          Hi,
          with my patch now the 4.0-ALPHA index (that declares as 4.0) is in fact failing with the above error message. The reason is that the default codec checks for Version.onOrAfter(LUCENE_4_0_0), which does not include 4.0-ALPHA and 4.0-BETA indexes.

          To make this check correct, we have 2 possibilities: Version.onOrAfter(LUCENE_4_0_0_ALPHA) or - as suggested before - add Version.isMinimumLucene4() (the same applies to analyzers, but its not important here) - or hardcode 4.0 as string into the codec, so it hits everything.

          Show
          Uwe Schindler added a comment - - edited Hi, with my patch now the 4.0-ALPHA index (that declares as 4.0) is in fact failing with the above error message. The reason is that the default codec checks for Version.onOrAfter(LUCENE_4_0_0), which does not include 4.0-ALPHA and 4.0-BETA indexes. To make this check correct, we have 2 possibilities: Version.onOrAfter(LUCENE_4_0_0_ALPHA) or - as suggested before - add Version.isMinimumLucene4() (the same applies to analyzers, but its not important here) - or hardcode 4.0 as string into the codec, so it hits everything.
          Hide
          Ryan Ernst added a comment -

          Checking on or after 4.0 alpha seems like the correct solution? Technically it did come before beta and final. So the 40 codec should be minimum of alpha.

          Show
          Ryan Ernst added a comment - Checking on or after 4.0 alpha seems like the correct solution? Technically it did come before beta and final. So the 40 codec should be minimum of alpha.
          Hide
          Uwe Schindler added a comment - - edited

          Ryan: How about analyzers then? In my opinion the checks there should also use 4_0_0_ALPHA?

          Once my eclipse works again (ant resolve hangs for long time here and then fails to download wstx parser), I will check for usage of the constant LUCENE_4_0_0

          Show
          Uwe Schindler added a comment - - edited Ryan: How about analyzers then? In my opinion the checks there should also use 4_0_0_ALPHA? Once my eclipse works again (ant resolve hangs for long time here and then fails to download wstx parser), I will check for usage of the constant LUCENE_4_0_0
          Hide
          Uwe Schindler added a comment - - edited

          New patch, fixing the version comparisons for Version >= 4.x

          Show
          Uwe Schindler added a comment - - edited New patch, fixing the version comparisons for Version >= 4.x
          Hide
          Ryan Ernst added a comment -

          Shouldn't the Version.java changed be dropped then?

          Show
          Ryan Ernst added a comment - Shouldn't the Version.java changed be dropped then?
          Hide
          Uwe Schindler added a comment -

          Here new patch: For now I fixed the "deprecated" old constants LUCENE_40 and LUCEN_4_0 to refer to LUCENE_4_0_0_ALPHA.

          Opinions?

          Show
          Uwe Schindler added a comment - Here new patch: For now I fixed the "deprecated" old constants LUCENE_40 and LUCEN_4_0 to refer to LUCENE_4_0_0_ALPHA. Opinions?
          Hide
          Ryan Ernst added a comment -

          Ok, patch looks good. I now understand that the previous comment with LUCENE_MAIN_VERSION, which described the layout of 4.0.0.1 < 4.0.0.2 < 4.0.0 was incorrect. The indexes were actually written with 4.0.0 (alpha), 4.0.0.1 (beta) and 4.0.0.2 (final)?

          Show
          Ryan Ernst added a comment - Ok, patch looks good. I now understand that the previous comment with LUCENE_MAIN_VERSION, which described the layout of 4.0.0.1 < 4.0.0.2 < 4.0.0 was incorrect. The indexes were actually written with 4.0.0 (alpha), 4.0.0.1 (beta) and 4.0.0.2 (final)?
          Hide
          Uwe Schindler added a comment -

          I beefed up the tests.

          Now the parseLeniently is consistent to the constants (bidirectional). Also I fixed the parsing to be really case-insensitive.

          Tests pass, including Solr.

          Show
          Uwe Schindler added a comment - I beefed up the tests. Now the parseLeniently is consistent to the constants (bidirectional). Also I fixed the parsing to be really case-insensitive. Tests pass, including Solr.
          Hide
          Uwe Schindler added a comment -

          The indexes were actually written with 4.0.0 (alpha), 4.0.0.1 (beta) and 4.0.0.2 (final)?

          Exactly!

          Show
          Uwe Schindler added a comment - The indexes were actually written with 4.0.0 (alpha), 4.0.0.1 (beta) and 4.0.0.2 (final)? Exactly!
          Hide
          Ryan Ernst added a comment -

          +1 to the current patch.

          Show
          Ryan Ernst added a comment - +1 to the current patch.
          Hide
          Uwe Schindler added a comment -

          Added a new test that checks for real bidirectionally between parseLeniently and the parse results. All field names as string should parse to itsself.

          This made me add all special cases to parseLeniently as String-based switch statement.

          Show
          Uwe Schindler added a comment - Added a new test that checks for real bidirectionally between parseLeniently and the parse results. All field names as string should parse to itsself. This made me add all special cases to parseLeniently as String-based switch statement.
          Hide
          Michael McCandless added a comment -

          Can we move the codec check:

                if ((si.getCodec() instanceof Lucene3xCodec) == false) {
          

          up into SIS.write, just before:

                    if (!segmentWasUpgraded(directory, si)) {
          

          ?

          This way if something like this tries to happen again in 4.x, at least we don't corrupt-on-kiss and instead just throw the exception and rollback the commit.

          Show
          Michael McCandless added a comment - Can we move the codec check: if ((si.getCodec() instanceof Lucene3xCodec) == false) { up into SIS.write, just before: if (!segmentWasUpgraded(directory, si)) { ? This way if something like this tries to happen again in 4.x, at least we don't corrupt-on-kiss and instead just throw the exception and rollback the commit.
          Hide
          ASF subversion and git services added a comment -

          Commit 1624073 from Uwe Schindler in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1624073 ]

          LUCENE-5934: Fix backwards compatibility for 4.0 indexes.

          Show
          ASF subversion and git services added a comment - Commit 1624073 from Uwe Schindler in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1624073 ] LUCENE-5934 : Fix backwards compatibility for 4.0 indexes.
          Hide
          Uwe Schindler added a comment - - edited

          Can we move the codec check:

          Mike, can you do this? I already committed to 4.x and I am now backporting/forward-porting.

          In fact I would leave the test where it is and instead add a second test at the !segmentWasUpgraded() place? This feels better to me, although it might be a duplicate test. Better test more thorough than corrumption.

          Show
          Uwe Schindler added a comment - - edited Can we move the codec check: Mike, can you do this? I already committed to 4.x and I am now backporting/forward-porting. In fact I would leave the test where it is and instead add a second test at the !segmentWasUpgraded() place? This feels better to me, although it might be a duplicate test. Better test more thorough than corrumption.
          Hide
          ASF subversion and git services added a comment -

          Commit 1624089 from Uwe Schindler in branch 'dev/trunk'
          [ https://svn.apache.org/r1624089 ]

          Merged revision(s) 1624073 from lucene/dev/branches/branch_4x:
          LUCENE-5934: Fix backwards compatibility for 4.0 indexes.

          Show
          ASF subversion and git services added a comment - Commit 1624089 from Uwe Schindler in branch 'dev/trunk' [ https://svn.apache.org/r1624089 ] Merged revision(s) 1624073 from lucene/dev/branches/branch_4x: LUCENE-5934 : Fix backwards compatibility for 4.0 indexes.
          Hide
          ASF subversion and git services added a comment -

          Commit 1624100 from Uwe Schindler in branch 'dev/branches/lucene_solr_4_10'
          [ https://svn.apache.org/r1624100 ]

          Merged revision(s) 1624073 from lucene/dev/branches/branch_4x:
          LUCENE-5934: Fix backwards compatibility for 4.0 indexes.

          Show
          ASF subversion and git services added a comment - Commit 1624100 from Uwe Schindler in branch 'dev/branches/lucene_solr_4_10' [ https://svn.apache.org/r1624100 ] Merged revision(s) 1624073 from lucene/dev/branches/branch_4x: LUCENE-5934 : Fix backwards compatibility for 4.0 indexes.
          Hide
          ASF subversion and git services added a comment -

          Commit 1624141 from Michael McCandless in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1624141 ]

          LUCENE-5934: also check this IllegalStateException up higher so it will not cause corruption when IW kisses an index if it ever kicks in again (hopefully not!)

          Show
          ASF subversion and git services added a comment - Commit 1624141 from Michael McCandless in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1624141 ] LUCENE-5934 : also check this IllegalStateException up higher so it will not cause corruption when IW kisses an index if it ever kicks in again (hopefully not!)
          Hide
          Michael McCandless added a comment -

          In fact I would leave the test where it is and instead add a second test at the !segmentWasUpgraded() place?

          OK I did that, except I moved the inner check up higher to before we make any destructive changes to the index.

          Also, I reverted the commit here, but kept Robert's commit, confirmed TestBackCompat fails with the IlegalStateExc and that the index is NOT corrupt.

          Show
          Michael McCandless added a comment - In fact I would leave the test where it is and instead add a second test at the !segmentWasUpgraded() place? OK I did that, except I moved the inner check up higher to before we make any destructive changes to the index. Also, I reverted the commit here, but kept Robert's commit, confirmed TestBackCompat fails with the IlegalStateExc and that the index is NOT corrupt.
          Hide
          Robert Muir added a comment -

          thank you Mike

          Show
          Robert Muir added a comment - thank you Mike
          Hide
          ASF subversion and git services added a comment -

          Commit 1624144 from Uwe Schindler in branch 'dev/branches/lucene_solr_4_10'
          [ https://svn.apache.org/r1624144 ]

          Merged revision(s) 1624141 from lucene/dev/branches/branch_4x:
          LUCENE-5934: also check this IllegalStateException up higher so it will not cause corruption when IW kisses an index if it ever kicks in again (hopefully not!)

          Show
          ASF subversion and git services added a comment - Commit 1624144 from Uwe Schindler in branch 'dev/branches/lucene_solr_4_10' [ https://svn.apache.org/r1624144 ] Merged revision(s) 1624141 from lucene/dev/branches/branch_4x: LUCENE-5934 : also check this IllegalStateException up higher so it will not cause corruption when IW kisses an index if it ever kicks in again (hopefully not!)
          Hide
          Uwe Schindler added a comment -

          Thanks Mike. I backported it. Issue finally resolved

          Show
          Uwe Schindler added a comment - Thanks Mike. I backported it. Issue finally resolved
          Hide
          ASF subversion and git services added a comment -

          Commit 1624146 from Uwe Schindler in branch 'dev/trunk'
          [ https://svn.apache.org/r1624146 ]

          LUCENE-5934: Add Ian Lea to CHANGES.txt

          Show
          ASF subversion and git services added a comment - Commit 1624146 from Uwe Schindler in branch 'dev/trunk' [ https://svn.apache.org/r1624146 ] LUCENE-5934 : Add Ian Lea to CHANGES.txt
          Hide
          ASF subversion and git services added a comment -

          Commit 1624148 from Uwe Schindler in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1624148 ]

          Merged revision(s) 1624146 from lucene/dev/trunk:
          LUCENE-5934: Add Ian Lea to CHANGES.txt

          Show
          ASF subversion and git services added a comment - Commit 1624148 from Uwe Schindler in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1624148 ] Merged revision(s) 1624146 from lucene/dev/trunk: LUCENE-5934 : Add Ian Lea to CHANGES.txt
          Hide
          ASF subversion and git services added a comment -

          Commit 1624149 from Uwe Schindler in branch 'dev/branches/lucene_solr_4_10'
          [ https://svn.apache.org/r1624149 ]

          Merged revision(s) 1624148 from lucene/dev/branches/branch_4x:
          Merged revision(s) 1624146 from lucene/dev/trunk:
          LUCENE-5934: Add Ian Lea to CHANGES.txt

          Show
          ASF subversion and git services added a comment - Commit 1624149 from Uwe Schindler in branch 'dev/branches/lucene_solr_4_10' [ https://svn.apache.org/r1624149 ] Merged revision(s) 1624148 from lucene/dev/branches/branch_4x: Merged revision(s) 1624146 from lucene/dev/trunk: LUCENE-5934 : Add Ian Lea to CHANGES.txt
          Hide
          Ryan Ernst added a comment -

          I just updated the description to make it clear this issue was only with upgrading 4.0.0-ALPHA or 4.0.0-BETA indexes.

          Show
          Ryan Ernst added a comment - I just updated the description to make it clear this issue was only with upgrading 4.0.0-ALPHA or 4.0.0-BETA indexes.
          Hide
          Robert Muir added a comment -

          Sorry, Ryan thats incorrect.

          The backwards compat here is too complicated!

          Show
          Robert Muir added a comment - Sorry, Ryan thats incorrect. The backwards compat here is too complicated!
          Hide
          Ryan Ernst added a comment -

          Doh, sorry about that. I was confused, and I want to clarify now what I understand.

          The bug would cause:
          4.0.0-ALPHA to be read as LUCENE_4_0_0
          4.0.0-BETA to be read as LUCENE_4_0_0_ALPHA
          4.0.0-FINAL to be read as LUCENE_4_0_0_BETA

          Then, because the check was looking for onOrAfter(LUCENE_4_0_0), beta and final indexes would incorrectly fall through to the 3x handling code.

          Show
          Ryan Ernst added a comment - Doh, sorry about that. I was confused, and I want to clarify now what I understand. The bug would cause: 4.0.0-ALPHA to be read as LUCENE_4_0_0 4.0.0-BETA to be read as LUCENE_4_0_0_ALPHA 4.0.0-FINAL to be read as LUCENE_4_0_0_BETA Then, because the check was looking for onOrAfter(LUCENE_4_0_0), beta and final indexes would incorrectly fall through to the 3x handling code.
          Hide
          Michael McCandless added a comment -

          Bulk close for Lucene/Solr 4.10.1 release

          Show
          Michael McCandless added a comment - Bulk close for Lucene/Solr 4.10.1 release

            People

            • Assignee:
              Uwe Schindler
              Reporter:
              Robert Muir
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development