Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.0.0
-
None
Description
Tested on 1.0 with commit id:
select commit_id from sys.version; +-------------------------------------------+ | commit_id | +-------------------------------------------+ | d8b19759657698581cc0d01d7038797952888123 | +-------------------------------------------+ 1 row selected (0.097 seconds)
When source data has column name like "dir0","dir1"...., the query may fail with "java.lang.IndexOutOfBoundsException".
For example:
> select `dir999` from dfs.root.`user/hive/warehouse/testdir999/3d49fc1fd0bc7e81-e6c5bb9affac8684_358897896_data.parquet`; Error: SYSTEM ERROR: java.lang.IndexOutOfBoundsException: index: 0, length: 4 (expected: range(0, 0)) Fragment 0:0 [Error Id: d289b3d7-1172-4ed7-b679-7af80d9aca7c on h1.poc.com:31010] (org.apache.drill.common.exceptions.DrillRuntimeException) Error in parquet record reader. Message: Hadoop path: /user/hive/warehouse/testdir999/3d49fc1fd0bc7e81-e6c5bb9affac8684_358897896_data.parquet Total records read: 0 Mock records read: 0 Records to read: 32768 Row group index: 0 Records in row group: 1 Parquet Metadata: ParquetMetaData{FileMetaData{schema: message schema { optional int32 id; optional binary dir999; } , metadata: {}}, blocks: [BlockMetaData{1, 98 [ColumnMetaData{SNAPPY [id] INT32 [PLAIN, RLE, PLAIN_DICTIONARY], 23}, ColumnMetaData{SNAPPY [dir999] BINARY [PLAIN, RLE, PLAIN_DICTIONARY], 103}]}]} org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.handleAndRaise():339 org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next():441 org.apache.drill.exec.physical.impl.ScanBatch.next():175 org.apache.drill.exec.physical.impl.BaseRootExec.next():83 org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():80 org.apache.drill.exec.physical.impl.BaseRootExec.next():73 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():259 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():253 java.security.AccessController.doPrivileged():-2 optional int32 id; optional binary dir999; } , metadata: {}}, blocks: [BlockMetaData{1, 98 [ColumnMetaData{SNAPPY [id] INT32 [PLAIN, RLE, PLAIN_DICTIONARY], 23}, ColumnMetaData{SNAPPY [dir999] BINARY [PLAIN, RLE, PLAIN_DICTIONARY], 103}]}]} org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.handleAndRaise():339 org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next():441 org.apache.drill.exec.physical.impl.ScanBatch.next():175 org.apache.drill.exec.physical.impl.BaseRootExec.next():83 org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():80 org.apache.drill.exec.physical.impl.BaseRootExec.next():73 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():259 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():253 java.security.AccessController.doPrivileged():-2 javax.security.auth.Subject.doAs():422 org.apache.hadoop.security.UserGroupInformation.doAs():1469 org.apache.drill.exec.work.fragment.FragmentExecutor.run():253 org.apache.drill.common.SelfCleaningRunnable.run():38 java.util.concurrent.ThreadPoolExecutor.runWorker():1142 java.util.concurrent.ThreadPoolExecutor$Worker.run():617 java.lang.Thread.run():745 Caused By (java.lang.IndexOutOfBoundsException) index: 0, length: 4 (expected: range(0, 0)) io.netty.buffer.DrillBuf.checkIndexD():189 io.netty.buffer.DrillBuf.chk():211 io.netty.buffer.DrillBuf.getInt():491 org.apache.drill.exec.vector.UInt4Vector$Accessor.get():321 org.apache.drill.exec.vector.VarBinaryVector$Mutator.setSafe():481 org.apache.drill.exec.vector.NullableVarBinaryVector$Mutator.fillEmpties():408 org.apache.drill.exec.vector.NullableVarBinaryVector$Mutator.setValueCount():513 org.apache.drill.exec.store.parquet.columnreaders.VarLenBinaryReader.readFields():78 org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next():425 org.apache.drill.exec.physical.impl.ScanBatch.next():175 org.apache.drill.exec.physical.impl.BaseRootExec.next():83 org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():80 org.apache.drill.exec.physical.impl.BaseRootExec.next():73 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():259 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():253 java.security.AccessController.doPrivileged():-2 javax.security.auth.Subject.doAs():422 org.apache.hadoop.security.UserGroupInformation.doAs():1469 org.apache.drill.exec.work.fragment.FragmentExecutor.run():253 org.apache.drill.common.SelfCleaningRunnable.run():38 java.util.concurrent.ThreadPoolExecutor.runWorker():1142 java.util.concurrent.ThreadPoolExecutor$Worker.run():617 java.lang.Thread.run():745 (state=,code=0)
My thought:
We need to fix this by
1. Either prompting a readable message saying "dirN" is a reserved column names, please change drill.exec.storage.file.partition.column.label to something else;
2. Or/And if source data has dirN columns, it should override our reserved "dirN".
3. We need to document "drill.exec.storage.file.partition.column.label" in http://drill.apache.org/docs/querying-directories/
4. drill.exec.storage.file.partition.column.label is a system level configuration, if we use it as a workaround, it will impact the whole system. Can we make it a session level?