[DRILL-5491] NPE when reading a CSV file, with headers, but blank header line - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.8.0
Fix Version/s: 1.17.0
Component/s: None
Labels:
None

Description

See ~~DRILL-5490~~ for background.

Try this unit test case:

    FixtureBuilder builder = ClusterFixture.builder()
        .maxParallelization(1);

    try (ClusterFixture cluster = builder.build();
         ClientFixture client = cluster.clientFixture()) {
      TextFormatConfig csvFormat = new TextFormatConfig();
      csvFormat.fieldDelimiter = ',';
      csvFormat.skipFirstLine = false;
      csvFormat.extractHeader = true;
      cluster.defineWorkspace("dfs", "data", "/tmp/data", "csv", csvFormat);
      String sql = "SELECT * FROM `dfs.data`.`csv/test7.csv`";
      client.queryBuilder().sql(sql).printCsv();
    }
  }

The test can also be run as a query using your favorite client.

Using this input file:

a,b,c
d,e,f

(The first line is blank.)

The following is the result:

Exception (no rows returned): org.apache.drill.common.exceptions.UserRemoteException: 
SYSTEM ERROR: NullPointerException

The RepeatedVarCharOutput class tries (but fails for the reasons outlined in ~~DRILL-5490~~) to detect this case.

The code crashes here in CompliantTextRecordReader.extractHeader():

    String [] fieldNames = ((RepeatedVarCharOutput)hOutput).getTextOutput();

Because of bad code in RepeatedVarCharOutput.getTextOutput():

  public String [] getTextOutput () throws ExecutionSetupException {
    if (recordCount == 0 || fieldIndex == -1) {
      return null;
    }

    if (this.recordStart != characterData) {
      throw new ExecutionSetupException("record text was requested before finishing record");
    }

Since there is no text on the line, special code elsewhere (see ~~DRILL-5490~~) elects not to increment the recordCount. (BTW: recordCount is the total across-batch count, probably the in-batch count, batchIndex, was wanted here.) Since the count is zero, we return null.

But, if the author probably thought we'd get a zero-length record, and the if-statement throws an exception in this case. But, see ~~DRILL-5490~~ about why this code does not actually work.

The result is one bug (not incrementing the record count), triggering another (returning a null), which masks a third (recordStart is not set correctly so the exception would not be thrown.)

All that bad code is just fun and games until we get an NPE, however.

Attachments

Activity

People

Assignee:: Paul Rogers

Reporter:: Paul Rogers

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 09/May/17 05:03

Updated:: 11/Oct/19 10:51

Resolved:: 11/Oct/19 10:51