Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-5451

Query on csv file w/ header fails with an exception when non existing column is requested if file is over 4096 lines long

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.10.0
    • 1.17.0
    • Storage - Text & CSV
    • None
    • Tested on CentOs 7 and Ubuntu

    Description

      When querying a text (csv) file with extractHeaders set to true, selecting a non existent column works as expected (returns "empty" value) when file has 4096 lines or fewer (1 header plus 4095 data), but results in an IndexOutOfBoundsException where the file has 4097 lines or more.

      With Storage config:

      "csvh": {
            "type": "text",
            "extensions": [
              "csvh"
            ],
            "extractHeader": true,
            "delimiter": ","
          }
      

      In the following 4096_lines.csvh has is identical to 4097_lines.csvh with the last line removed.

      Results:

      0: jdbc:drill:zk=local> select * from dfs.`/test/4097_lines.csvh` LIMIT 2;
      +----------+------------------------+
      | line_no  |    line_description    |
      +----------+------------------------+
      | 2        | this is line number 2  |
      | 3        | this is line number 3  |
      +----------+------------------------+
      2 rows selected (2.455 seconds)
      0: jdbc:drill:zk=local> select line_no, non_existent_field from dfs.`/test/4096_lines.csvh` LIMIT 2;
      +----------+---------------------+
      | line_no  | non_existent_field  |
      +----------+---------------------+
      | 2        |                     |
      | 3        |                     |
      +----------+---------------------+
      2 rows selected (2.248 seconds)
      0: jdbc:drill:zk=local> select line_no, non_existent_field from dfs.`/test/4097_lines.csvh` LIMIT 2;
      Error: SYSTEM ERROR: IndexOutOfBoundsException: index: 16384, length: 4 (expected: range(0, 16384))
      
      Fragment 0:0
      
      [Error Id: eb0974a8-026d-4048-9f10-ffb821a0d300 on localhost:31010]
      
        (java.lang.IndexOutOfBoundsException) index: 16384, length: 4 (expected: range(0, 16384))
          io.netty.buffer.DrillBuf.checkIndexD():123
          io.netty.buffer.DrillBuf.chk():147
          io.netty.buffer.DrillBuf.getInt():520
          org.apache.drill.exec.vector.UInt4Vector$Accessor.get():358
          org.apache.drill.exec.vector.VarCharVector$Mutator.setValueCount():659
          org.apache.drill.exec.physical.impl.ScanBatch.next():234
          org.apache.drill.exec.record.AbstractRecordBatch.next():119
          org.apache.drill.exec.record.AbstractRecordBatch.next():109
          org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
          org.apache.drill.exec.physical.impl.limit.LimitRecordBatch.innerNext():115
          org.apache.drill.exec.record.AbstractRecordBatch.next():162
          org.apache.drill.exec.record.AbstractRecordBatch.next():119
          org.apache.drill.exec.record.AbstractRecordBatch.next():109
          org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
          org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():93
          org.apache.drill.exec.record.AbstractRecordBatch.next():162
          org.apache.drill.exec.record.AbstractRecordBatch.next():119
          org.apache.drill.exec.record.AbstractRecordBatch.next():109
          org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
          org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():135
          org.apache.drill.exec.record.AbstractRecordBatch.next():162
          org.apache.drill.exec.physical.impl.BaseRootExec.next():104
          org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81
          org.apache.drill.exec.physical.impl.BaseRootExec.next():94
          org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():232
          org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():226
          java.security.AccessController.doPrivileged():-2
          javax.security.auth.Subject.doAs():422
          org.apache.hadoop.security.UserGroupInformation.doAs():1657
          org.apache.drill.exec.work.fragment.FragmentExecutor.run():226
          org.apache.drill.common.SelfCleaningRunnable.run():38
          java.util.concurrent.ThreadPoolExecutor.runWorker():1142
          java.util.concurrent.ThreadPoolExecutor$Worker.run():617
          java.lang.Thread.run():745 (state=,code=0)
      0: jdbc:drill:zk=local> 
      

      This seems similar to the issue fixed in DRILL-4108 but it now only manifests for longer files.

      I also see a similar result (i.e. it works for <= 4096 lines, IOBE for >4096 lines) for a

       SELECT count(*) ...

      from these files.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Paul.Rogers Paul Rogers
            pgfw Paul Wilson
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment