Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.8.0
-
None
-
None
Description
See DRILL-5490 for background.
Try this unit test case:
FixtureBuilder builder = ClusterFixture.builder() .maxParallelization(1); try (ClusterFixture cluster = builder.build(); ClientFixture client = cluster.clientFixture()) { TextFormatConfig csvFormat = new TextFormatConfig(); csvFormat.fieldDelimiter = ','; csvFormat.skipFirstLine = false; csvFormat.extractHeader = true; cluster.defineWorkspace("dfs", "data", "/tmp/data", "csv", csvFormat); String sql = "SELECT * FROM `dfs.data`.`csv/test7.csv`"; client.queryBuilder().sql(sql).printCsv(); } }
The test can also be run as a query using your favorite client.
Using this input file:
a,b,c d,e,f
(The first line is blank.)
The following is the result:
Exception (no rows returned): org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: NullPointerException
The RepeatedVarCharOutput class tries (but fails for the reasons outlined in DRILL-5490) to detect this case.
The code crashes here in CompliantTextRecordReader.extractHeader():
String [] fieldNames = ((RepeatedVarCharOutput)hOutput).getTextOutput();
Because of bad code in RepeatedVarCharOutput.getTextOutput():
public String [] getTextOutput () throws ExecutionSetupException { if (recordCount == 0 || fieldIndex == -1) { return null; } if (this.recordStart != characterData) { throw new ExecutionSetupException("record text was requested before finishing record"); }
Since there is no text on the line, special code elsewhere (see DRILL-5490) elects not to increment the recordCount. (BTW: recordCount is the total across-batch count, probably the in-batch count, batchIndex, was wanted here.) Since the count is zero, we return null.
But, if the author probably thought we'd get a zero-length record, and the if-statement throws an exception in this case. But, see DRILL-5490 about why this code does not actually work.
The result is one bug (not incrementing the record count), triggering another (returning a null), which masks a third (recordStart is not set correctly so the exception would not be thrown.)
All that bad code is just fun and games until we get an NPE, however.