Lucene - Core
  1. Lucene - Core
  2. LUCENE-2975

hotspot bug in readvint gives wrong results

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 3.1
    • Fix Version/s: 3.1, 3.2, 4.0-ALPHA
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      When testing the 3.1-RC1 made by Yonik on the PANGAEA (www.pangaea.de) productive system I figured out that suddenly on a large segment (about 5 GiB) some stored fiels suddenly produce a strange deflate decompression problem (CompressionTools) although the stored fields are no longer pre-3.0 compressed. It seems that the header of the stored field is read incorrectly at the buffer boundary in MultiMMapDir and then FieldsReader just incorrectly detects a deflate-compressed field (CompressionTools).

      The error occurs reproducible on CheckIndex with MMapDirectory, but not with NIODir or SimpleDir. The FDT file of that segment is 2.6 GiB, on Solaris the chunk size is Integer.MAX_VALUE, so we have 2 MultiMMap IndexInputs.

      Robert and me have the index ready as a tar file, we will do tests on our local machines and hopefully solve the bug, maybe introduced by Robert's recent changes to MMap.

      1. LUCENE-2975.patch
        8 kB
        Uwe Schindler
      2. perf.png
        30 kB
        Michael McCandless
      3. LUCENE-2975.patch
        7 kB
        Uwe Schindler
      4. LUCENE-2975.patch
        2 kB
        Uwe Schindler
      5. LUCENE-2975.patch
        2 kB
        Uwe Schindler
      6. LUCENE-2975.patch
        2 kB
        Uwe Schindler

        Issue Links

          Activity

          Hide
          Uwe Schindler added a comment -

          Here the output from CheckIndex (wrapped by log4j):

          2011-03-18 15:45:02,633 INFO de.pangaea.metadataportal.harvester.Checker - Checking index "pangaea"...
          2011-03-18 15:45:02,953 INFO org.apache.lucene.index.CheckIndex - Segments file=segments_g8o numSegments=8 version=FORMAT_3_1 [Lucene 3.1]
          2011-03-18 15:45:02,955 INFO org.apache.lucene.index.CheckIndex -   1 of 8: name=_xdx docCount=644683
          2011-03-18 15:45:02,955 INFO org.apache.lucene.index.CheckIndex -     compound=false
          2011-03-18 15:45:02,955 INFO org.apache.lucene.index.CheckIndex -     hasProx=true
          2011-03-18 15:45:02,956 INFO org.apache.lucene.index.CheckIndex -     numFiles=12
          2011-03-18 15:45:02,957 INFO org.apache.lucene.index.CheckIndex -     size (MB)=5,150.848
          2011-03-18 15:45:02,957 INFO org.apache.lucene.index.CheckIndex -     diagnostics = {optimize=true, mergeFactor=3, os.version=5.10, os=SunOS, lucene.version=3.1.0 1081525 - 2011-03-14 15:32:45, source=merge, os.arch=amd64, java.version=1.6.0_21, java.vendor=Sun Microsystems Inc.}
          2011-03-18 15:45:02,957 INFO org.apache.lucene.index.CheckIndex -     has deletions [delFileName=_xdx_1f.del]
          2011-03-18 15:45:04,320 INFO org.apache.lucene.index.CheckIndex -     test: open reader.........OK [2786 deleted docs]
          2011-03-18 15:45:05,203 INFO org.apache.lucene.index.CheckIndex -     test: fields..............OK [17616 fields]
          2011-03-18 15:45:06,608 INFO org.apache.lucene.index.CheckIndex -     test: field norms.........OK [235 fields]
          2011-03-18 15:50:18,216 INFO org.apache.lucene.index.CheckIndex -     test: terms, freq, prox...OK [54488825 terms; 524900381 terms/docs pairs; 628086112 tokens]
          2011-03-18 15:50:19,315 INFO org.apache.lucene.index.CheckIndex -     test: stored fields.......ERROR [field data are in wrong format: java.util.zip.DataFormatException: unknown compression method]
          2011-03-18 15:50:19,315 INFO org.apache.lucene.index.CheckIndex - org.apache.lucene.index.CorruptIndexException: field data are in wrong format: java.util.zip.DataFormatException: unknown compression method
          2011-03-18 15:50:19,316 INFO org.apache.lucene.index.CheckIndex - 	at org.apache.lucene.index.FieldsReader.uncompress(FieldsReader.java:605)
          2011-03-18 15:50:19,317 INFO org.apache.lucene.index.CheckIndex - 	at org.apache.lucene.index.FieldsReader.addField(FieldsReader.java:377)
          2011-03-18 15:50:19,317 INFO org.apache.lucene.index.CheckIndex - 	at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:259)
          2011-03-18 15:50:19,317 INFO org.apache.lucene.index.CheckIndex - 	at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:934)
          2011-03-18 15:50:19,317 INFO org.apache.lucene.index.CheckIndex - 	at org.apache.lucene.index.IndexReader.document(IndexReader.java:844)
          2011-03-18 15:50:19,317 INFO org.apache.lucene.index.CheckIndex - 	at org.apache.lucene.index.CheckIndex.testStoredFields(CheckIndex.java:702)
          2011-03-18 15:50:19,317 INFO org.apache.lucene.index.CheckIndex - 	at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:517)
          2011-03-18 15:50:19,317 INFO org.apache.lucene.index.CheckIndex - 	at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:298)
          2011-03-18 15:50:19,318 INFO org.apache.lucene.index.CheckIndex - 	at de.pangaea.metadataportal.harvester.Checker.main(Checker.java:72)
          2011-03-18 15:50:19,318 INFO org.apache.lucene.index.CheckIndex - Caused by: java.util.zip.DataFormatException: unknown compression method
          2011-03-18 15:50:19,318 INFO org.apache.lucene.index.CheckIndex - 	at java.util.zip.Inflater.inflateBytes(Native Method)
          2011-03-18 15:50:19,318 INFO org.apache.lucene.index.CheckIndex - 	at java.util.zip.Inflater.inflate(Inflater.java:238)
          2011-03-18 15:50:19,318 INFO org.apache.lucene.index.CheckIndex - 	at java.util.zip.Inflater.inflate(Inflater.java:256)
          2011-03-18 15:50:19,318 INFO org.apache.lucene.index.CheckIndex - 	at org.apache.lucene.document.CompressionTools.decompress(CompressionTools.java:106)
          2011-03-18 15:50:19,318 INFO org.apache.lucene.index.CheckIndex - 	at org.apache.lucene.index.FieldsReader.uncompress(FieldsReader.java:602)
          2011-03-18 15:50:19,319 INFO org.apache.lucene.index.CheckIndex - 	... 8 more
          2011-03-18 15:50:41,360 INFO org.apache.lucene.index.CheckIndex -     test: term vectors........OK [492976 total vector count; avg 0.768 term/freq vector fields per doc]
          2011-03-18 15:50:41,360 INFO org.apache.lucene.index.CheckIndex - FAILED
          2011-03-18 15:50:41,360 INFO org.apache.lucene.index.CheckIndex -     WARNING: fixIndex() would remove reference to this segment; full exception:
          2011-03-18 15:50:41,360 INFO org.apache.lucene.index.CheckIndex - java.lang.RuntimeException: Stored Field test failed
          2011-03-18 15:50:41,360 INFO org.apache.lucene.index.CheckIndex - 	at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:529)
          2011-03-18 15:50:41,361 INFO org.apache.lucene.index.CheckIndex - 	at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:298)
          2011-03-18 15:50:41,361 INFO org.apache.lucene.index.CheckIndex - 	at de.pangaea.metadataportal.harvester.Checker.main(Checker.java:72)
          2011-03-18 15:50:42,759 INFO org.apache.lucene.index.CheckIndex -   2 of 8: name=_xgc docCount=2467
          2011-03-18 15:50:42,759 INFO org.apache.lucene.index.CheckIndex -     compound=true
          2011-03-18 15:50:42,759 INFO org.apache.lucene.index.CheckIndex -     hasProx=true
          2011-03-18 15:50:42,759 INFO org.apache.lucene.index.CheckIndex -     numFiles=2
          2011-03-18 15:50:42,760 INFO org.apache.lucene.index.CheckIndex -     size (MB)=12.495
          2011-03-18 15:50:42,760 INFO org.apache.lucene.index.CheckIndex -     diagnostics = {optimize=false, mergeFactor=10, os.version=5.10, os=SunOS, lucene.version=3.1.0 1081525 - 2011-03-14 15:32:45, source=merge, os.arch=amd64, java.version=1.6.0_21, java.vendor=Sun Microsystems Inc.}
          2011-03-18 15:50:42,760 INFO org.apache.lucene.index.CheckIndex -     has deletions [delFileName=_xgc_2.del]
          2011-03-18 15:50:42,830 INFO org.apache.lucene.index.CheckIndex -     test: open reader.........OK [7 deleted docs]
          2011-03-18 15:50:42,838 INFO org.apache.lucene.index.CheckIndex -     test: fields..............OK [17734 fields]
          2011-03-18 15:50:42,847 INFO org.apache.lucene.index.CheckIndex -     test: field norms.........OK [235 fields]
          2011-03-18 15:50:46,513 INFO org.apache.lucene.index.CheckIndex -     test: terms, freq, prox...OK [260092 terms; 1081044 terms/docs pairs; 1378854 tokens]
          2011-03-18 15:50:46,656 INFO org.apache.lucene.index.CheckIndex -     test: stored fields.......OK [73761 total field count; avg 29.984 fields per doc]
          2011-03-18 15:50:46,790 INFO org.apache.lucene.index.CheckIndex -     test: term vectors........OK [660 total vector count; avg 0.268 term/freq vector fields per doc]
          2011-03-18 15:50:46,805 INFO org.apache.lucene.index.CheckIndex -   3 of 8: name=_xgb docCount=128
          2011-03-18 15:50:46,805 INFO org.apache.lucene.index.CheckIndex -     compound=true
          2011-03-18 15:50:46,805 INFO org.apache.lucene.index.CheckIndex -     hasProx=true
          2011-03-18 15:50:46,805 INFO org.apache.lucene.index.CheckIndex -     numFiles=2
          2011-03-18 15:50:46,805 INFO org.apache.lucene.index.CheckIndex -     size (MB)=3.425
          2011-03-18 15:50:46,806 INFO org.apache.lucene.index.CheckIndex -     diagnostics = {os.version=5.10, os=SunOS, lucene.version=3.1.0 1081525 - 2011-03-14 15:32:45, source=flush, os.arch=amd64, java.version=1.6.0_21, java.vendor=Sun Microsystems Inc.}
          2011-03-18 15:50:46,806 INFO org.apache.lucene.index.CheckIndex -     has deletions [delFileName=_xgb_1.del]
          2011-03-18 15:50:46,843 INFO org.apache.lucene.index.CheckIndex -     test: open reader.........OK [93 deleted docs]
          2011-03-18 15:50:46,850 INFO org.apache.lucene.index.CheckIndex -     test: fields..............OK [17734 fields]
          2011-03-18 15:50:46,854 INFO org.apache.lucene.index.CheckIndex -     test: field norms.........OK [235 fields]
          2011-03-18 15:50:47,063 INFO org.apache.lucene.index.CheckIndex -     test: terms, freq, prox...OK [36032 terms; 396116 terms/docs pairs; 115684 tokens]
          2011-03-18 15:50:47,069 INFO org.apache.lucene.index.CheckIndex -     test: stored fields.......OK [4892 total field count; avg 139.771 fields per doc]
          2011-03-18 15:50:47,073 INFO org.apache.lucene.index.CheckIndex -     test: term vectors........OK [35 total vector count; avg 1 term/freq vector fields per doc]
          2011-03-18 15:50:47,076 INFO org.apache.lucene.index.CheckIndex -   4 of 8: name=_xgd docCount=269
          2011-03-18 15:50:47,076 INFO org.apache.lucene.index.CheckIndex -     compound=true
          2011-03-18 15:50:47,076 INFO org.apache.lucene.index.CheckIndex -     hasProx=true
          2011-03-18 15:50:47,076 INFO org.apache.lucene.index.CheckIndex -     numFiles=1
          2011-03-18 15:50:47,076 INFO org.apache.lucene.index.CheckIndex -     size (MB)=5.458
          2011-03-18 15:50:47,076 INFO org.apache.lucene.index.CheckIndex -     diagnostics = {os.version=5.10, os=SunOS, lucene.version=3.1.0 1081525 - 2011-03-14 15:32:45, source=flush, os.arch=amd64, java.version=1.6.0_21, java.vendor=Sun Microsystems Inc.}
          2011-03-18 15:50:47,077 INFO org.apache.lucene.index.CheckIndex -     no deletions
          2011-03-18 15:50:47,112 INFO org.apache.lucene.index.CheckIndex -     test: open reader.........OK
          2011-03-18 15:50:47,119 INFO org.apache.lucene.index.CheckIndex -     test: fields..............OK [17734 fields]
          2011-03-18 15:50:47,123 INFO org.apache.lucene.index.CheckIndex -     test: field norms.........OK [235 fields]
          2011-03-18 15:50:47,705 INFO org.apache.lucene.index.CheckIndex -     test: terms, freq, prox...OK [43919 terms; 652617 terms/docs pairs; 816813 tokens]
          2011-03-18 15:50:47,725 INFO org.apache.lucene.index.CheckIndex -     test: stored fields.......OK [37116 total field count; avg 137.978 fields per doc]
          2011-03-18 15:50:47,754 INFO org.apache.lucene.index.CheckIndex -     test: term vectors........OK [269 total vector count; avg 1 term/freq vector fields per doc]
          2011-03-18 15:50:47,757 INFO org.apache.lucene.index.CheckIndex -   5 of 8: name=_xge docCount=13
          2011-03-18 15:50:47,757 INFO org.apache.lucene.index.CheckIndex -     compound=true
          2011-03-18 15:50:47,757 INFO org.apache.lucene.index.CheckIndex -     hasProx=true
          2011-03-18 15:50:47,757 INFO org.apache.lucene.index.CheckIndex -     numFiles=1
          2011-03-18 15:50:47,758 INFO org.apache.lucene.index.CheckIndex -     size (MB)=0.923
          2011-03-18 15:50:47,758 INFO org.apache.lucene.index.CheckIndex -     diagnostics = {os.version=5.10, os=SunOS, lucene.version=3.1.0 1081525 - 2011-03-14 15:32:45, source=flush, os.arch=amd64, java.version=1.6.0_21, java.vendor=Sun Microsystems Inc.}
          2011-03-18 15:50:47,758 INFO org.apache.lucene.index.CheckIndex -     no deletions
          2011-03-18 15:50:47,788 INFO org.apache.lucene.index.CheckIndex -     test: open reader.........OK
          2011-03-18 15:50:47,794 INFO org.apache.lucene.index.CheckIndex -     test: fields..............OK [17734 fields]
          2011-03-18 15:50:47,798 INFO org.apache.lucene.index.CheckIndex -     test: field norms.........OK [235 fields]
          2011-03-18 15:50:47,920 INFO org.apache.lucene.index.CheckIndex -     test: terms, freq, prox...OK [26350 terms; 45640 terms/docs pairs; 103403 tokens]
          2011-03-18 15:50:47,922 INFO org.apache.lucene.index.CheckIndex -     test: stored fields.......OK [2278 total field count; avg 175.231 fields per doc]
          2011-03-18 15:50:47,925 INFO org.apache.lucene.index.CheckIndex -     test: term vectors........OK [13 total vector count; avg 1 term/freq vector fields per doc]
          2011-03-18 15:50:47,926 INFO org.apache.lucene.index.CheckIndex -   6 of 8: name=_xgf docCount=13
          2011-03-18 15:50:47,926 INFO org.apache.lucene.index.CheckIndex -     compound=true
          2011-03-18 15:50:47,927 INFO org.apache.lucene.index.CheckIndex -     hasProx=true
          2011-03-18 15:50:47,927 INFO org.apache.lucene.index.CheckIndex -     numFiles=1
          2011-03-18 15:50:47,927 INFO org.apache.lucene.index.CheckIndex -     size (MB)=0.387
          2011-03-18 15:50:47,927 INFO org.apache.lucene.index.CheckIndex -     diagnostics = {os.version=5.10, os=SunOS, lucene.version=3.1.0 1081525 - 2011-03-14 15:32:45, source=flush, os.arch=amd64, java.version=1.6.0_21, java.vendor=Sun Microsystems Inc.}
          2011-03-18 15:50:47,927 INFO org.apache.lucene.index.CheckIndex -     no deletions
          2011-03-18 15:50:47,962 INFO org.apache.lucene.index.CheckIndex -     test: open reader.........OK
          2011-03-18 15:50:47,968 INFO org.apache.lucene.index.CheckIndex -     test: fields..............OK [17734 fields]
          2011-03-18 15:50:47,972 INFO org.apache.lucene.index.CheckIndex -     test: field norms.........OK [235 fields]
          2011-03-18 15:50:48,005 INFO org.apache.lucene.index.CheckIndex -     test: terms, freq, prox...OK [6928 terms; 12186 terms/docs pairs; 14593 tokens]
          2011-03-18 15:50:48,006 INFO org.apache.lucene.index.CheckIndex -     test: stored fields.......OK [736 total field count; avg 56.615 fields per doc]
          2011-03-18 15:50:48,007 INFO org.apache.lucene.index.CheckIndex -     test: term vectors........OK [13 total vector count; avg 1 term/freq vector fields per doc]
          2011-03-18 15:50:48,008 INFO org.apache.lucene.index.CheckIndex -   7 of 8: name=_xgg docCount=4
          2011-03-18 15:50:48,008 INFO org.apache.lucene.index.CheckIndex -     compound=true
          2011-03-18 15:50:48,008 INFO org.apache.lucene.index.CheckIndex -     hasProx=true
          2011-03-18 15:50:48,008 INFO org.apache.lucene.index.CheckIndex -     numFiles=1
          2011-03-18 15:50:48,008 INFO org.apache.lucene.index.CheckIndex -     size (MB)=0.319
          2011-03-18 15:50:48,009 INFO org.apache.lucene.index.CheckIndex -     diagnostics = {os.version=5.10, os=SunOS, lucene.version=3.1.0 1081525 - 2011-03-14 15:32:45, source=flush, os.arch=amd64, java.version=1.6.0_21, java.vendor=Sun Microsystems Inc.}
          2011-03-18 15:50:48,009 INFO org.apache.lucene.index.CheckIndex -     no deletions
          2011-03-18 15:50:48,044 INFO org.apache.lucene.index.CheckIndex -     test: open reader.........OK
          2011-03-18 15:50:48,051 INFO org.apache.lucene.index.CheckIndex -     test: fields..............OK [17734 fields]
          2011-03-18 15:50:48,054 INFO org.apache.lucene.index.CheckIndex -     test: field norms.........OK [235 fields]
          2011-03-18 15:50:48,086 INFO org.apache.lucene.index.CheckIndex -     test: terms, freq, prox...OK [5194 terms; 7559 terms/docs pairs; 8971 tokens]
          2011-03-18 15:50:48,093 INFO org.apache.lucene.index.CheckIndex -     test: stored fields.......OK [436 total field count; avg 109 fields per doc]
          2011-03-18 15:50:48,093 INFO org.apache.lucene.index.CheckIndex -     test: term vectors........OK [4 total vector count; avg 1 term/freq vector fields per doc]
          2011-03-18 15:50:48,095 INFO org.apache.lucene.index.CheckIndex -   8 of 8: name=_xgh docCount=29
          2011-03-18 15:50:48,095 INFO org.apache.lucene.index.CheckIndex -     compound=true
          2011-03-18 15:50:48,095 INFO org.apache.lucene.index.CheckIndex -     hasProx=true
          2011-03-18 15:50:48,095 INFO org.apache.lucene.index.CheckIndex -     numFiles=1
          2011-03-18 15:50:48,095 INFO org.apache.lucene.index.CheckIndex -     size (MB)=1.082
          2011-03-18 15:50:48,095 INFO org.apache.lucene.index.CheckIndex -     diagnostics = {os.version=5.10, os=SunOS, lucene.version=3.1.0 1081525 - 2011-03-14 15:32:45, source=flush, os.arch=amd64, java.version=1.6.0_21, java.vendor=Sun Microsystems Inc.}
          2011-03-18 15:50:48,095 INFO org.apache.lucene.index.CheckIndex -     no deletions
          2011-03-18 15:50:48,130 INFO org.apache.lucene.index.CheckIndex -     test: open reader.........OK
          2011-03-18 15:50:48,137 INFO org.apache.lucene.index.CheckIndex -     test: fields..............OK [17734 fields]
          2011-03-18 15:50:48,141 INFO org.apache.lucene.index.CheckIndex -     test: field norms.........OK [235 fields]
          2011-03-18 15:50:48,268 INFO org.apache.lucene.index.CheckIndex -     test: terms, freq, prox...OK [25231 terms; 64396 terms/docs pairs; 104657 tokens]
          2011-03-18 15:50:48,271 INFO org.apache.lucene.index.CheckIndex -     test: stored fields.......OK [4172 total field count; avg 143.862 fields per doc]
          2011-03-18 15:50:48,274 INFO org.apache.lucene.index.CheckIndex -     test: term vectors........OK [29 total vector count; avg 1 term/freq vector fields per doc]
          2011-03-18 15:50:48,275 INFO org.apache.lucene.index.CheckIndex - WARNING: 1 broken segments (containing 641897 documents) detected
          2011-03-18 15:50:48,275 WARN de.pangaea.metadataportal.harvester.Checker - Finished checking of index "pangaea": Index is corrupt.
          
          Show
          Uwe Schindler added a comment - Here the output from CheckIndex (wrapped by log4j): 2011-03-18 15:45:02,633 INFO de.pangaea.metadataportal.harvester.Checker - Checking index "pangaea"... 2011-03-18 15:45:02,953 INFO org.apache.lucene.index.CheckIndex - Segments file=segments_g8o numSegments=8 version=FORMAT_3_1 [Lucene 3.1] 2011-03-18 15:45:02,955 INFO org.apache.lucene.index.CheckIndex - 1 of 8: name=_xdx docCount=644683 2011-03-18 15:45:02,955 INFO org.apache.lucene.index.CheckIndex - compound=false 2011-03-18 15:45:02,955 INFO org.apache.lucene.index.CheckIndex - hasProx=true 2011-03-18 15:45:02,956 INFO org.apache.lucene.index.CheckIndex - numFiles=12 2011-03-18 15:45:02,957 INFO org.apache.lucene.index.CheckIndex - size (MB)=5,150.848 2011-03-18 15:45:02,957 INFO org.apache.lucene.index.CheckIndex - diagnostics = {optimize=true, mergeFactor=3, os.version=5.10, os=SunOS, lucene.version=3.1.0 1081525 - 2011-03-14 15:32:45, source=merge, os.arch=amd64, java.version=1.6.0_21, java.vendor=Sun Microsystems Inc.} 2011-03-18 15:45:02,957 INFO org.apache.lucene.index.CheckIndex - has deletions [delFileName=_xdx_1f.del] 2011-03-18 15:45:04,320 INFO org.apache.lucene.index.CheckIndex - test: open reader.........OK [2786 deleted docs] 2011-03-18 15:45:05,203 INFO org.apache.lucene.index.CheckIndex - test: fields..............OK [17616 fields] 2011-03-18 15:45:06,608 INFO org.apache.lucene.index.CheckIndex - test: field norms.........OK [235 fields] 2011-03-18 15:50:18,216 INFO org.apache.lucene.index.CheckIndex - test: terms, freq, prox...OK [54488825 terms; 524900381 terms/docs pairs; 628086112 tokens] 2011-03-18 15:50:19,315 INFO org.apache.lucene.index.CheckIndex - test: stored fields.......ERROR [field data are in wrong format: java.util.zip.DataFormatException: unknown compression method] 2011-03-18 15:50:19,315 INFO org.apache.lucene.index.CheckIndex - org.apache.lucene.index.CorruptIndexException: field data are in wrong format: java.util.zip.DataFormatException: unknown compression method 2011-03-18 15:50:19,316 INFO org.apache.lucene.index.CheckIndex - at org.apache.lucene.index.FieldsReader.uncompress(FieldsReader.java:605) 2011-03-18 15:50:19,317 INFO org.apache.lucene.index.CheckIndex - at org.apache.lucene.index.FieldsReader.addField(FieldsReader.java:377) 2011-03-18 15:50:19,317 INFO org.apache.lucene.index.CheckIndex - at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:259) 2011-03-18 15:50:19,317 INFO org.apache.lucene.index.CheckIndex - at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:934) 2011-03-18 15:50:19,317 INFO org.apache.lucene.index.CheckIndex - at org.apache.lucene.index.IndexReader.document(IndexReader.java:844) 2011-03-18 15:50:19,317 INFO org.apache.lucene.index.CheckIndex - at org.apache.lucene.index.CheckIndex.testStoredFields(CheckIndex.java:702) 2011-03-18 15:50:19,317 INFO org.apache.lucene.index.CheckIndex - at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:517) 2011-03-18 15:50:19,317 INFO org.apache.lucene.index.CheckIndex - at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:298) 2011-03-18 15:50:19,318 INFO org.apache.lucene.index.CheckIndex - at de.pangaea.metadataportal.harvester.Checker.main(Checker.java:72) 2011-03-18 15:50:19,318 INFO org.apache.lucene.index.CheckIndex - Caused by: java.util.zip.DataFormatException: unknown compression method 2011-03-18 15:50:19,318 INFO org.apache.lucene.index.CheckIndex - at java.util.zip.Inflater.inflateBytes(Native Method) 2011-03-18 15:50:19,318 INFO org.apache.lucene.index.CheckIndex - at java.util.zip.Inflater.inflate(Inflater.java:238) 2011-03-18 15:50:19,318 INFO org.apache.lucene.index.CheckIndex - at java.util.zip.Inflater.inflate(Inflater.java:256) 2011-03-18 15:50:19,318 INFO org.apache.lucene.index.CheckIndex - at org.apache.lucene.document.CompressionTools.decompress(CompressionTools.java:106) 2011-03-18 15:50:19,318 INFO org.apache.lucene.index.CheckIndex - at org.apache.lucene.index.FieldsReader.uncompress(FieldsReader.java:602) 2011-03-18 15:50:19,319 INFO org.apache.lucene.index.CheckIndex - ... 8 more 2011-03-18 15:50:41,360 INFO org.apache.lucene.index.CheckIndex - test: term vectors........OK [492976 total vector count; avg 0.768 term/freq vector fields per doc] 2011-03-18 15:50:41,360 INFO org.apache.lucene.index.CheckIndex - FAILED 2011-03-18 15:50:41,360 INFO org.apache.lucene.index.CheckIndex - WARNING: fixIndex() would remove reference to this segment; full exception: 2011-03-18 15:50:41,360 INFO org.apache.lucene.index.CheckIndex - java.lang.RuntimeException: Stored Field test failed 2011-03-18 15:50:41,360 INFO org.apache.lucene.index.CheckIndex - at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:529) 2011-03-18 15:50:41,361 INFO org.apache.lucene.index.CheckIndex - at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:298) 2011-03-18 15:50:41,361 INFO org.apache.lucene.index.CheckIndex - at de.pangaea.metadataportal.harvester.Checker.main(Checker.java:72) 2011-03-18 15:50:42,759 INFO org.apache.lucene.index.CheckIndex - 2 of 8: name=_xgc docCount=2467 2011-03-18 15:50:42,759 INFO org.apache.lucene.index.CheckIndex - compound=true 2011-03-18 15:50:42,759 INFO org.apache.lucene.index.CheckIndex - hasProx=true 2011-03-18 15:50:42,759 INFO org.apache.lucene.index.CheckIndex - numFiles=2 2011-03-18 15:50:42,760 INFO org.apache.lucene.index.CheckIndex - size (MB)=12.495 2011-03-18 15:50:42,760 INFO org.apache.lucene.index.CheckIndex - diagnostics = {optimize=false, mergeFactor=10, os.version=5.10, os=SunOS, lucene.version=3.1.0 1081525 - 2011-03-14 15:32:45, source=merge, os.arch=amd64, java.version=1.6.0_21, java.vendor=Sun Microsystems Inc.} 2011-03-18 15:50:42,760 INFO org.apache.lucene.index.CheckIndex - has deletions [delFileName=_xgc_2.del] 2011-03-18 15:50:42,830 INFO org.apache.lucene.index.CheckIndex - test: open reader.........OK [7 deleted docs] 2011-03-18 15:50:42,838 INFO org.apache.lucene.index.CheckIndex - test: fields..............OK [17734 fields] 2011-03-18 15:50:42,847 INFO org.apache.lucene.index.CheckIndex - test: field norms.........OK [235 fields] 2011-03-18 15:50:46,513 INFO org.apache.lucene.index.CheckIndex - test: terms, freq, prox...OK [260092 terms; 1081044 terms/docs pairs; 1378854 tokens] 2011-03-18 15:50:46,656 INFO org.apache.lucene.index.CheckIndex - test: stored fields.......OK [73761 total field count; avg 29.984 fields per doc] 2011-03-18 15:50:46,790 INFO org.apache.lucene.index.CheckIndex - test: term vectors........OK [660 total vector count; avg 0.268 term/freq vector fields per doc] 2011-03-18 15:50:46,805 INFO org.apache.lucene.index.CheckIndex - 3 of 8: name=_xgb docCount=128 2011-03-18 15:50:46,805 INFO org.apache.lucene.index.CheckIndex - compound=true 2011-03-18 15:50:46,805 INFO org.apache.lucene.index.CheckIndex - hasProx=true 2011-03-18 15:50:46,805 INFO org.apache.lucene.index.CheckIndex - numFiles=2 2011-03-18 15:50:46,805 INFO org.apache.lucene.index.CheckIndex - size (MB)=3.425 2011-03-18 15:50:46,806 INFO org.apache.lucene.index.CheckIndex - diagnostics = {os.version=5.10, os=SunOS, lucene.version=3.1.0 1081525 - 2011-03-14 15:32:45, source=flush, os.arch=amd64, java.version=1.6.0_21, java.vendor=Sun Microsystems Inc.} 2011-03-18 15:50:46,806 INFO org.apache.lucene.index.CheckIndex - has deletions [delFileName=_xgb_1.del] 2011-03-18 15:50:46,843 INFO org.apache.lucene.index.CheckIndex - test: open reader.........OK [93 deleted docs] 2011-03-18 15:50:46,850 INFO org.apache.lucene.index.CheckIndex - test: fields..............OK [17734 fields] 2011-03-18 15:50:46,854 INFO org.apache.lucene.index.CheckIndex - test: field norms.........OK [235 fields] 2011-03-18 15:50:47,063 INFO org.apache.lucene.index.CheckIndex - test: terms, freq, prox...OK [36032 terms; 396116 terms/docs pairs; 115684 tokens] 2011-03-18 15:50:47,069 INFO org.apache.lucene.index.CheckIndex - test: stored fields.......OK [4892 total field count; avg 139.771 fields per doc] 2011-03-18 15:50:47,073 INFO org.apache.lucene.index.CheckIndex - test: term vectors........OK [35 total vector count; avg 1 term/freq vector fields per doc] 2011-03-18 15:50:47,076 INFO org.apache.lucene.index.CheckIndex - 4 of 8: name=_xgd docCount=269 2011-03-18 15:50:47,076 INFO org.apache.lucene.index.CheckIndex - compound=true 2011-03-18 15:50:47,076 INFO org.apache.lucene.index.CheckIndex - hasProx=true 2011-03-18 15:50:47,076 INFO org.apache.lucene.index.CheckIndex - numFiles=1 2011-03-18 15:50:47,076 INFO org.apache.lucene.index.CheckIndex - size (MB)=5.458 2011-03-18 15:50:47,076 INFO org.apache.lucene.index.CheckIndex - diagnostics = {os.version=5.10, os=SunOS, lucene.version=3.1.0 1081525 - 2011-03-14 15:32:45, source=flush, os.arch=amd64, java.version=1.6.0_21, java.vendor=Sun Microsystems Inc.} 2011-03-18 15:50:47,077 INFO org.apache.lucene.index.CheckIndex - no deletions 2011-03-18 15:50:47,112 INFO org.apache.lucene.index.CheckIndex - test: open reader.........OK 2011-03-18 15:50:47,119 INFO org.apache.lucene.index.CheckIndex - test: fields..............OK [17734 fields] 2011-03-18 15:50:47,123 INFO org.apache.lucene.index.CheckIndex - test: field norms.........OK [235 fields] 2011-03-18 15:50:47,705 INFO org.apache.lucene.index.CheckIndex - test: terms, freq, prox...OK [43919 terms; 652617 terms/docs pairs; 816813 tokens] 2011-03-18 15:50:47,725 INFO org.apache.lucene.index.CheckIndex - test: stored fields.......OK [37116 total field count; avg 137.978 fields per doc] 2011-03-18 15:50:47,754 INFO org.apache.lucene.index.CheckIndex - test: term vectors........OK [269 total vector count; avg 1 term/freq vector fields per doc] 2011-03-18 15:50:47,757 INFO org.apache.lucene.index.CheckIndex - 5 of 8: name=_xge docCount=13 2011-03-18 15:50:47,757 INFO org.apache.lucene.index.CheckIndex - compound=true 2011-03-18 15:50:47,757 INFO org.apache.lucene.index.CheckIndex - hasProx=true 2011-03-18 15:50:47,757 INFO org.apache.lucene.index.CheckIndex - numFiles=1 2011-03-18 15:50:47,758 INFO org.apache.lucene.index.CheckIndex - size (MB)=0.923 2011-03-18 15:50:47,758 INFO org.apache.lucene.index.CheckIndex - diagnostics = {os.version=5.10, os=SunOS, lucene.version=3.1.0 1081525 - 2011-03-14 15:32:45, source=flush, os.arch=amd64, java.version=1.6.0_21, java.vendor=Sun Microsystems Inc.} 2011-03-18 15:50:47,758 INFO org.apache.lucene.index.CheckIndex - no deletions 2011-03-18 15:50:47,788 INFO org.apache.lucene.index.CheckIndex - test: open reader.........OK 2011-03-18 15:50:47,794 INFO org.apache.lucene.index.CheckIndex - test: fields..............OK [17734 fields] 2011-03-18 15:50:47,798 INFO org.apache.lucene.index.CheckIndex - test: field norms.........OK [235 fields] 2011-03-18 15:50:47,920 INFO org.apache.lucene.index.CheckIndex - test: terms, freq, prox...OK [26350 terms; 45640 terms/docs pairs; 103403 tokens] 2011-03-18 15:50:47,922 INFO org.apache.lucene.index.CheckIndex - test: stored fields.......OK [2278 total field count; avg 175.231 fields per doc] 2011-03-18 15:50:47,925 INFO org.apache.lucene.index.CheckIndex - test: term vectors........OK [13 total vector count; avg 1 term/freq vector fields per doc] 2011-03-18 15:50:47,926 INFO org.apache.lucene.index.CheckIndex - 6 of 8: name=_xgf docCount=13 2011-03-18 15:50:47,926 INFO org.apache.lucene.index.CheckIndex - compound=true 2011-03-18 15:50:47,927 INFO org.apache.lucene.index.CheckIndex - hasProx=true 2011-03-18 15:50:47,927 INFO org.apache.lucene.index.CheckIndex - numFiles=1 2011-03-18 15:50:47,927 INFO org.apache.lucene.index.CheckIndex - size (MB)=0.387 2011-03-18 15:50:47,927 INFO org.apache.lucene.index.CheckIndex - diagnostics = {os.version=5.10, os=SunOS, lucene.version=3.1.0 1081525 - 2011-03-14 15:32:45, source=flush, os.arch=amd64, java.version=1.6.0_21, java.vendor=Sun Microsystems Inc.} 2011-03-18 15:50:47,927 INFO org.apache.lucene.index.CheckIndex - no deletions 2011-03-18 15:50:47,962 INFO org.apache.lucene.index.CheckIndex - test: open reader.........OK 2011-03-18 15:50:47,968 INFO org.apache.lucene.index.CheckIndex - test: fields..............OK [17734 fields] 2011-03-18 15:50:47,972 INFO org.apache.lucene.index.CheckIndex - test: field norms.........OK [235 fields] 2011-03-18 15:50:48,005 INFO org.apache.lucene.index.CheckIndex - test: terms, freq, prox...OK [6928 terms; 12186 terms/docs pairs; 14593 tokens] 2011-03-18 15:50:48,006 INFO org.apache.lucene.index.CheckIndex - test: stored fields.......OK [736 total field count; avg 56.615 fields per doc] 2011-03-18 15:50:48,007 INFO org.apache.lucene.index.CheckIndex - test: term vectors........OK [13 total vector count; avg 1 term/freq vector fields per doc] 2011-03-18 15:50:48,008 INFO org.apache.lucene.index.CheckIndex - 7 of 8: name=_xgg docCount=4 2011-03-18 15:50:48,008 INFO org.apache.lucene.index.CheckIndex - compound=true 2011-03-18 15:50:48,008 INFO org.apache.lucene.index.CheckIndex - hasProx=true 2011-03-18 15:50:48,008 INFO org.apache.lucene.index.CheckIndex - numFiles=1 2011-03-18 15:50:48,008 INFO org.apache.lucene.index.CheckIndex - size (MB)=0.319 2011-03-18 15:50:48,009 INFO org.apache.lucene.index.CheckIndex - diagnostics = {os.version=5.10, os=SunOS, lucene.version=3.1.0 1081525 - 2011-03-14 15:32:45, source=flush, os.arch=amd64, java.version=1.6.0_21, java.vendor=Sun Microsystems Inc.} 2011-03-18 15:50:48,009 INFO org.apache.lucene.index.CheckIndex - no deletions 2011-03-18 15:50:48,044 INFO org.apache.lucene.index.CheckIndex - test: open reader.........OK 2011-03-18 15:50:48,051 INFO org.apache.lucene.index.CheckIndex - test: fields..............OK [17734 fields] 2011-03-18 15:50:48,054 INFO org.apache.lucene.index.CheckIndex - test: field norms.........OK [235 fields] 2011-03-18 15:50:48,086 INFO org.apache.lucene.index.CheckIndex - test: terms, freq, prox...OK [5194 terms; 7559 terms/docs pairs; 8971 tokens] 2011-03-18 15:50:48,093 INFO org.apache.lucene.index.CheckIndex - test: stored fields.......OK [436 total field count; avg 109 fields per doc] 2011-03-18 15:50:48,093 INFO org.apache.lucene.index.CheckIndex - test: term vectors........OK [4 total vector count; avg 1 term/freq vector fields per doc] 2011-03-18 15:50:48,095 INFO org.apache.lucene.index.CheckIndex - 8 of 8: name=_xgh docCount=29 2011-03-18 15:50:48,095 INFO org.apache.lucene.index.CheckIndex - compound=true 2011-03-18 15:50:48,095 INFO org.apache.lucene.index.CheckIndex - hasProx=true 2011-03-18 15:50:48,095 INFO org.apache.lucene.index.CheckIndex - numFiles=1 2011-03-18 15:50:48,095 INFO org.apache.lucene.index.CheckIndex - size (MB)=1.082 2011-03-18 15:50:48,095 INFO org.apache.lucene.index.CheckIndex - diagnostics = {os.version=5.10, os=SunOS, lucene.version=3.1.0 1081525 - 2011-03-14 15:32:45, source=flush, os.arch=amd64, java.version=1.6.0_21, java.vendor=Sun Microsystems Inc.} 2011-03-18 15:50:48,095 INFO org.apache.lucene.index.CheckIndex - no deletions 2011-03-18 15:50:48,130 INFO org.apache.lucene.index.CheckIndex - test: open reader.........OK 2011-03-18 15:50:48,137 INFO org.apache.lucene.index.CheckIndex - test: fields..............OK [17734 fields] 2011-03-18 15:50:48,141 INFO org.apache.lucene.index.CheckIndex - test: field norms.........OK [235 fields] 2011-03-18 15:50:48,268 INFO org.apache.lucene.index.CheckIndex - test: terms, freq, prox...OK [25231 terms; 64396 terms/docs pairs; 104657 tokens] 2011-03-18 15:50:48,271 INFO org.apache.lucene.index.CheckIndex - test: stored fields.......OK [4172 total field count; avg 143.862 fields per doc] 2011-03-18 15:50:48,274 INFO org.apache.lucene.index.CheckIndex - test: term vectors........OK [29 total vector count; avg 1 term/freq vector fields per doc] 2011-03-18 15:50:48,275 INFO org.apache.lucene.index.CheckIndex - WARNING: 1 broken segments (containing 641897 documents) detected 2011-03-18 15:50:48,275 WARN de.pangaea.metadataportal.harvester.Checker - Finished checking of index "pangaea": Index is corrupt.
          Hide
          Robert Muir added a comment -

          again it looks like we are bit by the MultiMMapIndexInput:

          For background MMapDirectory has two types of IndexInputs it uses, the MMapIndexInput, and the MultiMMapIndexInput (only used for large files larger than the chunk size, which for Uwe defaults to Integer.MAX_VALUE).

          In the last release no unit tests even tested this MultiMmapII, we added a test and found a bug already in this chunking (LUCENE-2627), but I think we need to do more.

          Ideally our tests, via MockDirectoryWrapper, would sometimes force the use of this MultiMMapDirectory. The idea is to set a low chunk size to force this type of stuff. We might have to add a nasty instanceof check or similar to MockIndexInput to ensure we don't create too many mappings.

          And the second problem would be, if its an integer overflow issue, then we need a standalone 'slow' @Nightly-only test to ensure this case works.

          Show
          Robert Muir added a comment - again it looks like we are bit by the MultiMMapIndexInput: For background MMapDirectory has two types of IndexInputs it uses, the MMapIndexInput, and the MultiMMapIndexInput (only used for large files larger than the chunk size, which for Uwe defaults to Integer.MAX_VALUE). In the last release no unit tests even tested this MultiMmapII, we added a test and found a bug already in this chunking ( LUCENE-2627 ), but I think we need to do more. Ideally our tests, via MockDirectoryWrapper, would sometimes force the use of this MultiMMapDirectory. The idea is to set a low chunk size to force this type of stuff. We might have to add a nasty instanceof check or similar to MockIndexInput to ensure we don't create too many mappings. And the second problem would be, if its an integer overflow issue, then we need a standalone 'slow' @Nightly-only test to ensure this case works.
          Hide
          Hoss Man added a comment -

          comments on list indicate this is a blocker for 3.1

          Show
          Hoss Man added a comment - comments on list indicate this is a blocker for 3.1
          Hide
          Robert Muir added a comment -

          Guys, looks like a JRE bug.

          Here are Uwe and my tests so far on windows64:

          • 1.5.0_22: passes
          • 1.6.0_20: passes
          • 1.6.0_21: passes
          • 1.6.0_24: fails

          But we should think about if this should hold up a release, because its not a bug in our code and trying to make a workaround (like suggesting we use Integer.MAX_VALUE-8 instead of Integer.MAX_VALUE as a default) is just spitting in the wind...

          I for one won't look at the oracle proprietary code.

          +1 to unset blocker, and in the release notes mention that recent sun jvms are buggy.

          Show
          Robert Muir added a comment - Guys, looks like a JRE bug. Here are Uwe and my tests so far on windows64: 1.5.0_22: passes 1.6.0_20: passes 1.6.0_21: passes 1.6.0_24: fails But we should think about if this should hold up a release, because its not a bug in our code and trying to make a workaround (like suggesting we use Integer.MAX_VALUE-8 instead of Integer.MAX_VALUE as a default) is just spitting in the wind... I for one won't look at the oracle proprietary code. +1 to unset blocker, and in the release notes mention that recent sun jvms are buggy.
          Hide
          Uwe Schindler added a comment - - edited

          I just want to add that on Solaris 64 bit I had initially 1.6.0_21 installed. With that version CheckIndex passed, too (verified in my logs). But the bad thing: The application server logs showed random DataFormatErrors like CheckIndex a few times. This is why I started to debug and then upgraded to _24. Since _24 it fails always on Solaris.

          Robert does not look into JDK code, I did: They changed some places in NIO's unsafe Bits class (some memory copy methods), so I assume the bug is there.

          Show
          Uwe Schindler added a comment - - edited I just want to add that on Solaris 64 bit I had initially 1.6.0_21 installed. With that version CheckIndex passed, too (verified in my logs). But the bad thing: The application server logs showed random DataFormatErrors like CheckIndex a few times. This is why I started to debug and then upgraded to _24. Since _24 it fails always on Solaris. Robert does not look into JDK code, I did: They changed some places in NIO's unsafe Bits class (some memory copy methods), so I assume the bug is there.
          Hide
          Robert Muir added a comment -

          1.6.0_23 fails also, so its not just _24+

          Show
          Robert Muir added a comment - 1.6.0_23 fails also, so its not just _24+
          Hide
          Robert Muir added a comment -

          passes always with -Xint, even with the newer jvms.

          looks like a hotspot bug, don't think there is much we can do.

          Show
          Robert Muir added a comment - passes always with -Xint, even with the newer jvms. looks like a hotspot bug, don't think there is much we can do.
          Hide
          Robert Muir added a comment -

          The hotspot bug happens in a pretty crappy place to tell you the truth, readVint()

          I know this because if i make a 'stupid' readVint like this:

              @Override
              public int readVInt() throws IOException {
                byte scratch[] = new byte[1];
                readBytes(scratch, 0, 1);
                byte b = scratch[0];
                int i = b & 0x7F;
                for (int shift = 7; (b & 0x80) != 0; shift += 7) {
                  readBytes(scratch, 0, 1);
                  b = scratch[0];
                  i |= (b & 0x7F) << shift;
                }
                return i;
              }
          

          then it avoids the bug. so i guess the trick would be to make one thats just as fast as what we have now, but still avoids the bug.

          Show
          Robert Muir added a comment - The hotspot bug happens in a pretty crappy place to tell you the truth, readVint() I know this because if i make a 'stupid' readVint like this: @Override public int readVInt() throws IOException { byte scratch[] = new byte[1]; readBytes(scratch, 0, 1); byte b = scratch[0]; int i = b & 0x7F; for (int shift = 7; (b & 0x80) != 0; shift += 7) { readBytes(scratch, 0, 1); b = scratch[0]; i |= (b & 0x7F) << shift; } return i; } then it avoids the bug. so i guess the trick would be to make one thats just as fast as what we have now, but still avoids the bug.
          Hide
          Uwe Schindler added a comment -

          Here loop-unwinding in readVInt and readVLong -> fixes bug!

          Show
          Uwe Schindler added a comment - Here loop-unwinding in readVInt and readVLong -> fixes bug!
          Hide
          Uwe Schindler added a comment -

          A slightly optimized version of the patch (last assignment removed and directly returned). Still works, so I assume the problem is readByte() in the loop.

          Show
          Uwe Schindler added a comment - A slightly optimized version of the patch (last assignment removed and directly returned). Still works, so I assume the problem is readByte() in the loop.
          Hide
          Uwe Schindler added a comment -
          Show
          Uwe Schindler added a comment - The bug looks like that one: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=5091921
          Hide
          Dawid Weiss added a comment -

          We've recently hit an awesome JIT bug too... There is a loop like this one:

                      for (c0 = ALPHABET_SIZE - 2, j = m; 0 < j; --c0)
                      {
                          for (c1 = ALPHABET_SIZE - 1; c0 < c1; j = i, --c1)
                          {
                              i = bucket_B[(c0) * ALPHABET_SIZE + (c1)];
                              if (1 < (j - i))
                              {
                                  ssSort(PAb, i, j, buf, bufsize, 2, n, SA[i] == (m - 1));
                              }
                          }
                      }
          

          c0 and c1 are local variables, so they are decreased on every loop round... but if the bug manifests itself c0 and c1 all of a sudden jump to some incredible values and never change, falling into an endless loop... Fun to watch.

          http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7024475

          I'm wondering how production systems can run flawlessly if such things exists out there

          Show
          Dawid Weiss added a comment - We've recently hit an awesome JIT bug too... There is a loop like this one: for (c0 = ALPHABET_SIZE - 2, j = m; 0 < j; --c0) { for (c1 = ALPHABET_SIZE - 1; c0 < c1; j = i, --c1) { i = bucket_B[(c0) * ALPHABET_SIZE + (c1)]; if (1 < (j - i)) { ssSort(PAb, i, j, buf, bufsize, 2, n, SA[i] == (m - 1)); } } } c0 and c1 are local variables, so they are decreased on every loop round... but if the bug manifests itself c0 and c1 all of a sudden jump to some incredible values and never change, falling into an endless loop... Fun to watch. http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7024475 I'm wondering how production systems can run flawlessly if such things exists out there
          Hide
          Uwe Schindler added a comment -

          As the loop is now hardcoded and never iterates more than 5 or 9 times, we should add an assert after the last byte is read. Its highest bit must bei 0, else there would follow more bytes which should not happen.

          This can lead to problems if you have no symmetry between writeVLong and readVLong (instead use readVInt).

          Show
          Uwe Schindler added a comment - As the loop is now hardcoded and never iterates more than 5 or 9 times, we should add an assert after the last byte is read. Its highest bit must bei 0, else there would follow more bytes which should not happen. This can lead to problems if you have no symmetry between writeVLong and readVLong (instead use readVInt).
          Hide
          Uwe Schindler added a comment -

          This is an extended patch that also adds a testcase.

          The problem with TestIndexInput was, that it only used a BufferedIndexInput (because MockIndexInput extends that), so the unwinded loops were never tested.

          I added a second test that uses RAMDirectory and writes the bytes and then reads them again. RAMIndexInput extends IndexInput directly, so uses readVInt/readVLong versions form IndexInput.

          I also added the special cases Integer.MAX_VALUE and Long.MAX_VALUE to the tests!

          Show
          Uwe Schindler added a comment - This is an extended patch that also adds a testcase. The problem with TestIndexInput was, that it only used a BufferedIndexInput (because MockIndexInput extends that), so the unwinded loops were never tested. I added a second test that uses RAMDirectory and writes the bytes and then reads them again. RAMIndexInput extends IndexInput directly, so uses readVInt/readVLong versions form IndexInput. I also added the special cases Integer.MAX_VALUE and Long.MAX_VALUE to the tests!
          Hide
          Michael McCandless added a comment - - edited

          I ran trunk vs patch perf test:

          This is w/ MMapDir on multi-seg index, but index isn't large enough to use MultiMMapII.

          Java command is java -server -Xmx2g -Xms2g

          Show
          Michael McCandless added a comment - - edited I ran trunk vs patch perf test: This is w/ MMapDir on multi-seg index, but index isn't large enough to use MultiMMapII. Java command is java -server -Xmx2g -Xms2g
          Hide
          Uwe Schindler added a comment -

          Before I commit this stuff, I wanted to conclude all Robert and me found out yesterday (in fact, Robert was doing the debugging work). I just had the idea how to solve it.

          The issue has not really something to do with MMap and also is in the Hotspot VM even before 1.6.0_22 (I hit the bug in our server logs for a IndexReader.document() call and were not able to reproduce, this was 1.6.0_21). So this bug also affects previous Lucene versions, BUT:

          The bug only happens, if the IndexInput.readByte() method is inlined by hotspot and we are not using BufferedIndexInput (which has its own VInt impl). Lucene 3.0 and prev had a very complicated readByte() method in MMap, hotspot never inlined. But since an performance update in MMapIndexInput/MultiMMapIndexInput, the readByte method got a three liner simply delegating to the ByteBuffer's getByte() and catching an exception to fallback to another impl (for changing the buffer slice in MultiMMap). So most calls are simply calls to NIOs getByte() which may be intrinsics or whatever (we don't know what Hotspot does there). Sun optimizes a lot at NIO!

          This leads to a problem with the loop inside readVInt; all other methods in IndexInput have already unwinded loops, so readLong is not a "for (i=0; i<64; i+=8)" loop, it is coded with all shifts precalculated. To solve the bug, I did the same for readVInt and readVLong.

          We don't know which bug in hotspot is the real cause for this, so David Weiss's bug looks really identical, especially as we had in one of our tests also an endless loop (Robert have seen it, not confirmed). There are a lot of hotspot bugs related to loops, so one of them hit us here.

          Show
          Uwe Schindler added a comment - Before I commit this stuff, I wanted to conclude all Robert and me found out yesterday (in fact, Robert was doing the debugging work). I just had the idea how to solve it. The issue has not really something to do with MMap and also is in the Hotspot VM even before 1.6.0_22 (I hit the bug in our server logs for a IndexReader.document() call and were not able to reproduce, this was 1.6.0_21). So this bug also affects previous Lucene versions, BUT : The bug only happens, if the IndexInput.readByte() method is inlined by hotspot and we are not using BufferedIndexInput (which has its own VInt impl). Lucene 3.0 and prev had a very complicated readByte() method in MMap, hotspot never inlined. But since an performance update in MMapIndexInput/MultiMMapIndexInput, the readByte method got a three liner simply delegating to the ByteBuffer's getByte() and catching an exception to fallback to another impl (for changing the buffer slice in MultiMMap). So most calls are simply calls to NIOs getByte() which may be intrinsics or whatever (we don't know what Hotspot does there). Sun optimizes a lot at NIO! This leads to a problem with the loop inside readVInt; all other methods in IndexInput have already unwinded loops, so readLong is not a "for (i=0; i<64; i+=8)" loop, it is coded with all shifts precalculated. To solve the bug, I did the same for readVInt and readVLong. We don't know which bug in hotspot is the real cause for this, so David Weiss's bug looks really identical, especially as we had in one of our tests also an endless loop (Robert have seen it, not confirmed). There are a lot of hotspot bugs related to loops, so one of them hit us here.
          Hide
          Uwe Schindler added a comment -

          Added comments to explain why the loop was removed (patch applies to 3.1 release branch). Also added changes.txt.

          This is now ready to commit.

          Show
          Uwe Schindler added a comment - Added comments to explain why the loop was removed (patch applies to 3.1 release branch). Also added changes.txt. This is now ready to commit.
          Hide
          Robert Muir added a comment -

          +1

          Show
          Robert Muir added a comment - +1
          Hide
          Uwe Schindler added a comment -

          Committed 3.1 revision: 1083239
          Committed 3.x revision: 1083240
          Committed trunk revision: 1083245

          Show
          Uwe Schindler added a comment - Committed 3.1 revision: 1083239 Committed 3.x revision: 1083240 Committed trunk revision: 1083245
          Hide
          Grant Ingersoll added a comment -

          Bulk close for 3.1

          Show
          Grant Ingersoll added a comment - Bulk close for 3.1

            People

            • Assignee:
              Uwe Schindler
              Reporter:
              Uwe Schindler
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development