Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-3118

Improve error messaging if table has name that conflicts with partition label

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.0.0
    • Fix Version/s: Future
    • Component/s: Execution - Flow
    • Labels:
      None

      Description

      Tested on 1.0 with commit id:

      select commit_id from sys.version;
      +-------------------------------------------+
      |                 commit_id                 |
      +-------------------------------------------+
      | d8b19759657698581cc0d01d7038797952888123  |
      +-------------------------------------------+
      1 row selected (0.097 seconds)
      

      When source data has column name like "dir0","dir1"...., the query may fail with "java.lang.IndexOutOfBoundsException".

      For example:

      > select `dir999` from dfs.root.`user/hive/warehouse/testdir999/3d49fc1fd0bc7e81-e6c5bb9affac8684_358897896_data.parquet`;
      Error: SYSTEM ERROR: java.lang.IndexOutOfBoundsException: index: 0, length: 4 (expected: range(0, 0))
      
      Fragment 0:0
      
      [Error Id: d289b3d7-1172-4ed7-b679-7af80d9aca7c on h1.poc.com:31010]
      
        (org.apache.drill.common.exceptions.DrillRuntimeException) Error in parquet record reader.
      Message:
      Hadoop path: /user/hive/warehouse/testdir999/3d49fc1fd0bc7e81-e6c5bb9affac8684_358897896_data.parquet
      Total records read: 0
      Mock records read: 0
      Records to read: 32768
      Row group index: 0
      Records in row group: 1
      Parquet Metadata: ParquetMetaData{FileMetaData{schema: message schema {
        optional int32 id;
        optional binary dir999;
      }
      , metadata: {}}, blocks: [BlockMetaData{1, 98 [ColumnMetaData{SNAPPY [id] INT32  [PLAIN, RLE, PLAIN_DICTIONARY], 23}, ColumnMetaData{SNAPPY [dir999] BINARY  [PLAIN, RLE, PLAIN_DICTIONARY], 103}]}]}
          org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.handleAndRaise():339
          org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next():441
          org.apache.drill.exec.physical.impl.ScanBatch.next():175
          org.apache.drill.exec.physical.impl.BaseRootExec.next():83
          org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():80
          org.apache.drill.exec.physical.impl.BaseRootExec.next():73
          org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():259
          org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():253
          java.security.AccessController.doPrivileged():-2
        optional int32 id;
        optional binary dir999;
      }
      , metadata: {}}, blocks: [BlockMetaData{1, 98 [ColumnMetaData{SNAPPY [id] INT32  [PLAIN, RLE, PLAIN_DICTIONARY], 23}, ColumnMetaData{SNAPPY [dir999] BINARY  [PLAIN, RLE, PLAIN_DICTIONARY], 103}]}]}
          org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.handleAndRaise():339
          org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next():441
          org.apache.drill.exec.physical.impl.ScanBatch.next():175
          org.apache.drill.exec.physical.impl.BaseRootExec.next():83
          org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():80
          org.apache.drill.exec.physical.impl.BaseRootExec.next():73
          org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():259
          org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():253
          java.security.AccessController.doPrivileged():-2
          javax.security.auth.Subject.doAs():422
          org.apache.hadoop.security.UserGroupInformation.doAs():1469
          org.apache.drill.exec.work.fragment.FragmentExecutor.run():253
          org.apache.drill.common.SelfCleaningRunnable.run():38
          java.util.concurrent.ThreadPoolExecutor.runWorker():1142
          java.util.concurrent.ThreadPoolExecutor$Worker.run():617
          java.lang.Thread.run():745
        Caused By (java.lang.IndexOutOfBoundsException) index: 0, length: 4 (expected: range(0, 0))
          io.netty.buffer.DrillBuf.checkIndexD():189
          io.netty.buffer.DrillBuf.chk():211
          io.netty.buffer.DrillBuf.getInt():491
          org.apache.drill.exec.vector.UInt4Vector$Accessor.get():321
          org.apache.drill.exec.vector.VarBinaryVector$Mutator.setSafe():481
          org.apache.drill.exec.vector.NullableVarBinaryVector$Mutator.fillEmpties():408
          org.apache.drill.exec.vector.NullableVarBinaryVector$Mutator.setValueCount():513
          org.apache.drill.exec.store.parquet.columnreaders.VarLenBinaryReader.readFields():78
          org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next():425
          org.apache.drill.exec.physical.impl.ScanBatch.next():175
          org.apache.drill.exec.physical.impl.BaseRootExec.next():83
          org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():80
          org.apache.drill.exec.physical.impl.BaseRootExec.next():73
          org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():259
          org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():253
          java.security.AccessController.doPrivileged():-2
          javax.security.auth.Subject.doAs():422
          org.apache.hadoop.security.UserGroupInformation.doAs():1469
          org.apache.drill.exec.work.fragment.FragmentExecutor.run():253
          org.apache.drill.common.SelfCleaningRunnable.run():38
          java.util.concurrent.ThreadPoolExecutor.runWorker():1142
          java.util.concurrent.ThreadPoolExecutor$Worker.run():617
          java.lang.Thread.run():745 (state=,code=0)
      

      My thought:
      We need to fix this by
      1. Either prompting a readable message saying "dirN" is a reserved column names, please change drill.exec.storage.file.partition.column.label to something else;
      2. Or/And if source data has dirN columns, it should override our reserved "dirN".
      3. We need to document "drill.exec.storage.file.partition.column.label" in http://drill.apache.org/docs/querying-directories/
      4. drill.exec.storage.file.partition.column.label is a system level configuration, if we use it as a workaround, it will impact the whole system. Can we make it a session level?

        Attachments

          Activity

            People

            • Assignee:
              ppenumarthy Padma Penumarthy
              Reporter:
              haozhu Hao Zhu
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: