Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.10.0
-
None
-
Tested on CentOs 7 and Ubuntu
Description
When querying a text (csv) file with extractHeaders set to true, selecting a non existent column works as expected (returns "empty" value) when file has 4096 lines or fewer (1 header plus 4095 data), but results in an IndexOutOfBoundsException where the file has 4097 lines or more.
With Storage config:
"csvh": { "type": "text", "extensions": [ "csvh" ], "extractHeader": true, "delimiter": "," }
In the following 4096_lines.csvh has is identical to 4097_lines.csvh with the last line removed.
Results:
0: jdbc:drill:zk=local> select * from dfs.`/test/4097_lines.csvh` LIMIT 2; +----------+------------------------+ | line_no | line_description | +----------+------------------------+ | 2 | this is line number 2 | | 3 | this is line number 3 | +----------+------------------------+ 2 rows selected (2.455 seconds) 0: jdbc:drill:zk=local> select line_no, non_existent_field from dfs.`/test/4096_lines.csvh` LIMIT 2; +----------+---------------------+ | line_no | non_existent_field | +----------+---------------------+ | 2 | | | 3 | | +----------+---------------------+ 2 rows selected (2.248 seconds) 0: jdbc:drill:zk=local> select line_no, non_existent_field from dfs.`/test/4097_lines.csvh` LIMIT 2; Error: SYSTEM ERROR: IndexOutOfBoundsException: index: 16384, length: 4 (expected: range(0, 16384)) Fragment 0:0 [Error Id: eb0974a8-026d-4048-9f10-ffb821a0d300 on localhost:31010] (java.lang.IndexOutOfBoundsException) index: 16384, length: 4 (expected: range(0, 16384)) io.netty.buffer.DrillBuf.checkIndexD():123 io.netty.buffer.DrillBuf.chk():147 io.netty.buffer.DrillBuf.getInt():520 org.apache.drill.exec.vector.UInt4Vector$Accessor.get():358 org.apache.drill.exec.vector.VarCharVector$Mutator.setValueCount():659 org.apache.drill.exec.physical.impl.ScanBatch.next():234 org.apache.drill.exec.record.AbstractRecordBatch.next():119 org.apache.drill.exec.record.AbstractRecordBatch.next():109 org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51 org.apache.drill.exec.physical.impl.limit.LimitRecordBatch.innerNext():115 org.apache.drill.exec.record.AbstractRecordBatch.next():162 org.apache.drill.exec.record.AbstractRecordBatch.next():119 org.apache.drill.exec.record.AbstractRecordBatch.next():109 org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51 org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():93 org.apache.drill.exec.record.AbstractRecordBatch.next():162 org.apache.drill.exec.record.AbstractRecordBatch.next():119 org.apache.drill.exec.record.AbstractRecordBatch.next():109 org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51 org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():135 org.apache.drill.exec.record.AbstractRecordBatch.next():162 org.apache.drill.exec.physical.impl.BaseRootExec.next():104 org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81 org.apache.drill.exec.physical.impl.BaseRootExec.next():94 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():232 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():226 java.security.AccessController.doPrivileged():-2 javax.security.auth.Subject.doAs():422 org.apache.hadoop.security.UserGroupInformation.doAs():1657 org.apache.drill.exec.work.fragment.FragmentExecutor.run():226 org.apache.drill.common.SelfCleaningRunnable.run():38 java.util.concurrent.ThreadPoolExecutor.runWorker():1142 java.util.concurrent.ThreadPoolExecutor$Worker.run():617 java.lang.Thread.run():745 (state=,code=0) 0: jdbc:drill:zk=local>
This seems similar to the issue fixed in DRILL-4108 but it now only manifests for longer files.
I also see a similar result (i.e. it works for <= 4096 lines, IOBE for >4096 lines) for a
SELECT count(*) ...
from these files.
Attachments
Attachments
Issue Links
- relates to
-
DRILL-5470 Offset vector data corruption with CSV data
- Closed