Description
In Patched Base encoding, the first three bits of headerThirdByte represent the base value width. If Math.abs(min) greater than or equal to 1 << 56, the value of baseBytes is 9, and the value of bb goes beyond range fo byte.
final boolean isNegative = min < 0 ? true : false; if (isNegative) { min = -min; } // find the number of bytes required for base and shift it by 5 bits // to accommodate patch width. The additional bit is used to store the sign // of the base value. final int baseWidth = utils.findClosestNumBits(min) + 1; final int baseBytes = baseWidth % 8 == 0 ? baseWidth / 8 : (baseWidth / 8) + 1; final int bb = (baseBytes - 1) << 5; // if the base value is negative then set MSB to 1 if (isNegative) { min |= (1L << ((baseBytes * 8) - 1)); } // third byte contains 3 bits for number of bytes occupied by base // and 5 bits for patchWidth final int headerThirdByte = bb | utils.encodeBitWidth(patchWidth);
The byte to be written is the eight low-order bits of the headerThirdByte, the value read by RunLengthIntegerReaderV2 is incorrect, as well as data of the column is unexpected.
// extract the number of bytes occupied by base int thirdByte = input.read(); int bw = (thirdByte >>> 5) & 0x07; // base width is one off bw += 1;
In some cases, RunLengthIntegerReaderV2 fails with EOFExeption.
Caused by: java.io.EOFException: Read past end of RLE integer from compressed stream Stream for column 2 kind DATA position: 3213835 length: 3213835 range: 0 offset: 3217373 limit: 3217373 range 0 = 0 to 3213835 uncompressed: 184478 to 184478
at org.apache.orc.impl.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:61)
at org.apache.orc.impl.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:323)
at org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:369)
at org.apache.orc.impl.TreeReaderFactory$LongTreeReader.nextVector(TreeReaderFactory.java:587)
at org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:1815)
at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1184)
... 20 more
For example, consider the following sequence:
long data[] = {-9007199254740992l,-8725724278030337l,-1125762467889153l,-1l,-9007199254740992l,-9007199254740992l,-497l,127l,-1l,-72057594037927936l,-4194304l,-9007199254740992l,-4503599593816065l,-4194304l,-8936830510563329l,-9007199254740992l, -1l, -70334384439312l,-4063233l, -6755399441973249l};
The min value is -72057594037927936(-1 << 56),RLEv2 writes this sequence with Patched Base encoding, and the data read out by RunLengthIntegerReaderV2 is:
[281474976710656, 36275087623585792, 247390116249599, 72053196528287743, 72057594037927935, 72022409665839104, 246290604621824, -71776119061217282, 4222124650659840, 36028797018963967, 71776119061217280, 281474976694272, 246290604621824, 263882790797311, 72057594037911552, 246565482528767, 72022409665839104, 281474976710655, 72057319294238719, 67835469387252223]
Attachments
Issue Links
- links to