Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-5451

Query on csv file w/ header fails with an exception when non existing column is requested if file is over 4096 lines long

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.10.0
    • Fix Version/s: None
    • Component/s: Storage - Text & CSV
    • Labels:
      None
    • Environment:

      Tested on CentOs 7 and Ubuntu

      Description

      When querying a text (csv) file with extractHeaders set to true, selecting a non existent column works as expected (returns "empty" value) when file has 4096 lines or fewer (1 header plus 4095 data), but results in an IndexOutOfBoundsException where the file has 4097 lines or more.

      With Storage config:

      "csvh": {
            "type": "text",
            "extensions": [
              "csvh"
            ],
            "extractHeader": true,
            "delimiter": ","
          }
      

      In the following 4096_lines.csvh has is identical to 4097_lines.csvh with the last line removed.

      Results:

      0: jdbc:drill:zk=local> select * from dfs.`/test/4097_lines.csvh` LIMIT 2;
      +----------+------------------------+
      | line_no  |    line_description    |
      +----------+------------------------+
      | 2        | this is line number 2  |
      | 3        | this is line number 3  |
      +----------+------------------------+
      2 rows selected (2.455 seconds)
      0: jdbc:drill:zk=local> select line_no, non_existent_field from dfs.`/test/4096_lines.csvh` LIMIT 2;
      +----------+---------------------+
      | line_no  | non_existent_field  |
      +----------+---------------------+
      | 2        |                     |
      | 3        |                     |
      +----------+---------------------+
      2 rows selected (2.248 seconds)
      0: jdbc:drill:zk=local> select line_no, non_existent_field from dfs.`/test/4097_lines.csvh` LIMIT 2;
      Error: SYSTEM ERROR: IndexOutOfBoundsException: index: 16384, length: 4 (expected: range(0, 16384))
      
      Fragment 0:0
      
      [Error Id: eb0974a8-026d-4048-9f10-ffb821a0d300 on localhost:31010]
      
        (java.lang.IndexOutOfBoundsException) index: 16384, length: 4 (expected: range(0, 16384))
          io.netty.buffer.DrillBuf.checkIndexD():123
          io.netty.buffer.DrillBuf.chk():147
          io.netty.buffer.DrillBuf.getInt():520
          org.apache.drill.exec.vector.UInt4Vector$Accessor.get():358
          org.apache.drill.exec.vector.VarCharVector$Mutator.setValueCount():659
          org.apache.drill.exec.physical.impl.ScanBatch.next():234
          org.apache.drill.exec.record.AbstractRecordBatch.next():119
          org.apache.drill.exec.record.AbstractRecordBatch.next():109
          org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
          org.apache.drill.exec.physical.impl.limit.LimitRecordBatch.innerNext():115
          org.apache.drill.exec.record.AbstractRecordBatch.next():162
          org.apache.drill.exec.record.AbstractRecordBatch.next():119
          org.apache.drill.exec.record.AbstractRecordBatch.next():109
          org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
          org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():93
          org.apache.drill.exec.record.AbstractRecordBatch.next():162
          org.apache.drill.exec.record.AbstractRecordBatch.next():119
          org.apache.drill.exec.record.AbstractRecordBatch.next():109
          org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
          org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():135
          org.apache.drill.exec.record.AbstractRecordBatch.next():162
          org.apache.drill.exec.physical.impl.BaseRootExec.next():104
          org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81
          org.apache.drill.exec.physical.impl.BaseRootExec.next():94
          org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():232
          org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():226
          java.security.AccessController.doPrivileged():-2
          javax.security.auth.Subject.doAs():422
          org.apache.hadoop.security.UserGroupInformation.doAs():1657
          org.apache.drill.exec.work.fragment.FragmentExecutor.run():226
          org.apache.drill.common.SelfCleaningRunnable.run():38
          java.util.concurrent.ThreadPoolExecutor.runWorker():1142
          java.util.concurrent.ThreadPoolExecutor$Worker.run():617
          java.lang.Thread.run():745 (state=,code=0)
      0: jdbc:drill:zk=local> 
      

      This seems similar to the issue fixed in DRILL-4108 but it now only manifests for longer files.

      I also see a similar result (i.e. it works for <= 4096 lines, IOBE for >4096 lines) for a

       SELECT count(*) ...

      from these files.

        Attachments

        1. 4097_lines.csvh
          118 kB
          Paul Wilson

          Issue Links

            Activity

              People

              • Assignee:
                Paul.Rogers Paul Rogers
                Reporter:
                pgfw Paul Wilson
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated: