Details
-
Bug
-
Status: Patch Available
-
Blocker
-
Resolution: Unresolved
-
3.3.0, 3.2.1
-
None
-
Patch
Description
org.apache.hadoop.io.file.tfile.TestTFileStreams#testOneEntryMixedLengths1 org.apache.hadoop.io.file.tfile.TestTFileStreams#testOneEntryUnknownLength org.apache.hadoop.io.file.tfile.TestTFileLzoCodecsStreams#testOneEntryMixedLengths1 org.apache.hadoop.io.file.tfile.TestTFileLzoCodecsStreams#testOneEntryUnknownLength
4 actively-used tests above call the helper function `TestTFileStreams#writeRecords()` to write key-value pairs (kv pairs), then call `TestTFileByteArrays#readRecords()` to assert the key and the value part (v) of these kv pairs matched with what they wrote. All v of kv pairs are hardcode strings with a length of 6.
`readRecords()` uses `org.apache.hadoop.io.file.tfile.TFile.Reader.Scanner.Entry#getValueLength()` to get full length of the v of these kv pairs. But `getValueLength()` can only get v's full length when it is less than the value of configuration parameter `tfile.io.chunk.size`, otherwise `readRecords()` will throw an exception. So, when `tfile.io.chunk.size` is configured/set to a value less than 6, these 4 tests failed because of the exception from `readRecords()`, even 6 is a valid value for `tfile.io.chunk.size`.
The definition of `tfile.io.chunk.size` is "Value chunk size in bytes. Default to 1MB. Values of the length less than the chunk size is guaranteed to have known value length in read time (See also TFile.Reader.Scanner.Entry.isValueLengthKnown())".
Fixes
`readRecords()` should call `org.apache.hadoop.io.file.tfile.TFile.Reader.Scanner.Entry#getValue(byte[])` instead, which returns the correct full length of the `value` part despite whether the value's length is larger than `tfile.io.chunk.size`.