Lucene - Core
  1. Lucene - Core
  2. LUCENE-4114

Large docID / docvalue size combination produces arithmetic overflow

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.0-ALPHA, 6.0
    • Fix Version/s: 4.0-ALPHA, 6.0
    • Component/s: core/codecs
    • Labels:
      None
    • Environment:

      Ubuntu 10.04
      Sun Java 6 b26
      MultiIndex over six directoryindices with ~80M documents each.

    • Lucene Fields:
      New

      Description

      My test case has run across an arithmetic bug in FixedStraightBytesImpl - specifically in the @Override on position(int). The combination of size=32 and docID = 70M produces a negative integer and causes a stacktrace in MMapDirectory.seek(long) which I will post below. I would imagine that this hasn't been hit before because most index shards have less than 70M documents.

      That is to say, when asking for the docvalue in a FixedStraightBytes field where the size of values are 32 and the docID can be sufficiently high, an arithmetic overflow occurs in line 359 of FixedStraightBytesImpl.java, causing the method to return a negative value when operating on the first (0th) shard with a baseOffset of 0. This produces an IllegalArgumentException:

      Caused by: java.lang.IllegalArgumentException: Seeking to negative position: MMapIndexInput(_7_5_dv.dat in path="/.../_7_dv.cfs" slice=5:2562955684)
      at org.apache.lucene.store.MMapDirectory$MMapIndexInput.seek(MMapDirectory.java:396)
      at org.apache.lucene.codecs.lucene40.values.FixedStraightBytesImpl$DirectFixedStraightSource.position(FixedStraightBytesImpl.java:359)
      at org.apache.lucene.codecs.lucene40.values.DirectSource.getBytes(DirectSource.java:60)
      at org.apache.lucene.codecs.lucene40.values.FixedStraightBytesImpl$DirectFixedStraightSource.getBytes(FixedStraightBytesImpl.java:349)
      ...

      1. LUCENE-4114.patch
        2 kB
        Michael McCandless

        Activity

        Hide
        Walt Elder added a comment -

        Followup: I built my own version of lucene-core with this line corrected (I cast docID to long) and ran my tests successfully.

        Show
        Walt Elder added a comment - Followup: I built my own version of lucene-core with this line corrected (I cast docID to long) and ran my tests successfully.
        Hide
        Jack Krupansky added a comment -

        There may be a similar bug at line 345:

        return data.fillSlice(bytesRef, docID * size, size);

        And at line 340 in Lucene40StoredFieldsReader:

        indexStream.seek(HEADER_LENGTH_IDX + docID * 8L);

        Show
        Jack Krupansky added a comment - There may be a similar bug at line 345: return data.fillSlice(bytesRef, docID * size, size); And at line 340 in Lucene40StoredFieldsReader: indexStream.seek(HEADER_LENGTH_IDX + docID * 8L);
        Hide
        Jack Krupansky added a comment -

        Sorry, the latter case is probably okay due to the "L" on "8L". BUt the former case is just int * int.

        Show
        Jack Krupansky added a comment - Sorry, the latter case is probably okay due to the "L" on "8L". BUt the former case is just int * int.
        Hide
        Michael McCandless added a comment -

        Thanks Walt, nice find!

        I attached patch w/ 3 places that could overflow ... can anyone find any others in the DocValues sources...?

        Show
        Michael McCandless added a comment - Thanks Walt, nice find! I attached patch w/ 3 places that could overflow ... can anyone find any others in the DocValues sources...?
        Hide
        Michael McCandless added a comment -

        Thanks Walt!

        Show
        Michael McCandless added a comment - Thanks Walt!
        Hide
        Walt Elder added a comment -

        Awesome. Thanks all.

        Show
        Walt Elder added a comment - Awesome. Thanks all.
        Hide
        Simon Willnauer added a comment -

        thanks mike!

        Show
        Simon Willnauer added a comment - thanks mike!

          People

          • Assignee:
            Unassigned
            Reporter:
            Walt Elder
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 2h
              2h
              Remaining:
              Remaining Estimate - 2h
              2h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development