[DRILL-3118] Improve error messaging if table has name that conflicts with partition label - ASF JIRA

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 1.0.0
Fix Version/s: Future
Component/s: Execution - Flow
Labels:
None

Description

Tested on 1.0 with commit id:

select commit_id from sys.version;
+-------------------------------------------+
|                 commit_id                 |
+-------------------------------------------+
| d8b19759657698581cc0d01d7038797952888123  |
+-------------------------------------------+
1 row selected (0.097 seconds)

When source data has column name like "dir0","dir1"...., the query may fail with "java.lang.IndexOutOfBoundsException".

For example:

> select `dir999` from dfs.root.`user/hive/warehouse/testdir999/3d49fc1fd0bc7e81-e6c5bb9affac8684_358897896_data.parquet`;
Error: SYSTEM ERROR: java.lang.IndexOutOfBoundsException: index: 0, length: 4 (expected: range(0, 0))

Fragment 0:0

[Error Id: d289b3d7-1172-4ed7-b679-7af80d9aca7c on h1.poc.com:31010]

  (org.apache.drill.common.exceptions.DrillRuntimeException) Error in parquet record reader.
Message:
Hadoop path: /user/hive/warehouse/testdir999/3d49fc1fd0bc7e81-e6c5bb9affac8684_358897896_data.parquet
Total records read: 0
Mock records read: 0
Records to read: 32768
Row group index: 0
Records in row group: 1
Parquet Metadata: ParquetMetaData{FileMetaData{schema: message schema {
  optional int32 id;
  optional binary dir999;
}
, metadata: {}}, blocks: [BlockMetaData{1, 98 [ColumnMetaData{SNAPPY [id] INT32  [PLAIN, RLE, PLAIN_DICTIONARY], 23}, ColumnMetaData{SNAPPY [dir999] BINARY  [PLAIN, RLE, PLAIN_DICTIONARY], 103}]}]}
    org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.handleAndRaise():339
    org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next():441
    org.apache.drill.exec.physical.impl.ScanBatch.next():175
    org.apache.drill.exec.physical.impl.BaseRootExec.next():83
    org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():80
    org.apache.drill.exec.physical.impl.BaseRootExec.next():73
    org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():259
    org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():253
    java.security.AccessController.doPrivileged():-2
  optional int32 id;
  optional binary dir999;
}
, metadata: {}}, blocks: [BlockMetaData{1, 98 [ColumnMetaData{SNAPPY [id] INT32  [PLAIN, RLE, PLAIN_DICTIONARY], 23}, ColumnMetaData{SNAPPY [dir999] BINARY  [PLAIN, RLE, PLAIN_DICTIONARY], 103}]}]}
    org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.handleAndRaise():339
    org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next():441
    org.apache.drill.exec.physical.impl.ScanBatch.next():175
    org.apache.drill.exec.physical.impl.BaseRootExec.next():83
    org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():80
    org.apache.drill.exec.physical.impl.BaseRootExec.next():73
    org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():259
    org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():253
    java.security.AccessController.doPrivileged():-2
    javax.security.auth.Subject.doAs():422
    org.apache.hadoop.security.UserGroupInformation.doAs():1469
    org.apache.drill.exec.work.fragment.FragmentExecutor.run():253
    org.apache.drill.common.SelfCleaningRunnable.run():38
    java.util.concurrent.ThreadPoolExecutor.runWorker():1142
    java.util.concurrent.ThreadPoolExecutor$Worker.run():617
    java.lang.Thread.run():745
  Caused By (java.lang.IndexOutOfBoundsException) index: 0, length: 4 (expected: range(0, 0))
    io.netty.buffer.DrillBuf.checkIndexD():189
    io.netty.buffer.DrillBuf.chk():211
    io.netty.buffer.DrillBuf.getInt():491
    org.apache.drill.exec.vector.UInt4Vector$Accessor.get():321
    org.apache.drill.exec.vector.VarBinaryVector$Mutator.setSafe():481
    org.apache.drill.exec.vector.NullableVarBinaryVector$Mutator.fillEmpties():408
    org.apache.drill.exec.vector.NullableVarBinaryVector$Mutator.setValueCount():513
    org.apache.drill.exec.store.parquet.columnreaders.VarLenBinaryReader.readFields():78
    org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next():425
    org.apache.drill.exec.physical.impl.ScanBatch.next():175
    org.apache.drill.exec.physical.impl.BaseRootExec.next():83
    org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():80
    org.apache.drill.exec.physical.impl.BaseRootExec.next():73
    org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():259
    org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():253
    java.security.AccessController.doPrivileged():-2
    javax.security.auth.Subject.doAs():422
    org.apache.hadoop.security.UserGroupInformation.doAs():1469
    org.apache.drill.exec.work.fragment.FragmentExecutor.run():253
    org.apache.drill.common.SelfCleaningRunnable.run():38
    java.util.concurrent.ThreadPoolExecutor.runWorker():1142
    java.util.concurrent.ThreadPoolExecutor$Worker.run():617
    java.lang.Thread.run():745 (state=,code=0)

My thought:
We need to fix this by
1. Either prompting a readable message saying "dirN" is a reserved column names, please change drill.exec.storage.file.partition.column.label to something else;
2. Or/And if source data has dirN columns, it should override our reserved "dirN".
3. We need to document "drill.exec.storage.file.partition.column.label" in http://drill.apache.org/docs/querying-directories/
4. drill.exec.storage.file.partition.column.label is a system level configuration, if we use it as a workaround, it will impact the whole system. Can we make it a session level?

Improve error messaging if table has name that conflicts with partition label

Details

Description

Attachments

Activity

People

Dates