Details
-
Bug
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
1.10.0
-
None
-
None
-
None
Description
The class RepeatedVarCharOutput, used to read text data into the columns array, has a member called recordStart which is supposed to track the memory offset of the start of the record data in the columns VarChar array. However, the logic for the member is very wrong. First, it is initialized based on an uninitialized variable:
code
public void startBatch()
private void loadVarCharDataAddress()
{ ... this.characterDataOriginal = buf.memoryAddress(); ... }code
Second, the class keeps track of actual memory addresses for the VarChar data. When data would exceed the buffer, memory is reallocated using the following method. But, this method does not adjust the recordStart member, which now points to a bogus memory address:
private void expandVarCharData(){ vector.getDataVector().reAlloc(); long diff = characterData - characterDataOriginal; loadVarCharDataAddress(); characterData += diff; }
Fortunately, like so many bugs in this class, these bugs are offset by the fact that the variable is never used in a way that cause obvious problems (else these issues would have been found sooner.) The only real use is:
@Override public boolean rowHasData() { return recordStart < characterData; }
Which, it turns out, is used only at EOF in TextReader.parseRecord():
} catch(StreamFinishedPseudoException e) { // if we've written part of a field or all of a field, we should send this row. if (fieldsWritten == 0 && !output.rowHasData()) { throw e; } }
Thus, the bug will cause a failure only if:
- A batch is resized, and
- The new address is lower than the original address, and
- It is the last batch in the file.
Because of the low probability of hitting the bug, the priority is set to Minor.