Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-4767

Parquet reader throw IllegalArgumentException for int32 type with GZIP compression

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.7.0
    • Fix Version/s: None
    • Component/s: Storage - Parquet
    • Labels:
      None

      Description

      Created a small parquet file with the following schema:

      [root@perfnode166 parquet-mr]# java -jar parquet-tools/target/parquet-tools-1.8.2-SNAPSHOT.jar schema /mapr/drill50.perf.lab/drill/testdata/parquet_storage/int32_10_bs10k_ps1k_gzip.parquet
      message test {
        required int32 int32_field_required;
        optional int32 int32_field_optional;
        repeated int32 int32_field_repeated;
      }
      

      and meta

      [root@perfnode166 parquet-mr]# java -jar parquet-tools/target/parquet-tools-1.8.2-SNAPSHOT.jar meta /mapr/drill50.perf.lab/drill/testdata/parquet_storage/int32_10_bs10k_ps1k_gzip.parquet
      file:                 file:/mapr/drill50.perf.lab/drill/testdata/parquet_storage/int32_10_bs10k_ps1k_gzip.parquet
      creator:              parquet-mr version 1.8.2-SNAPSHOT (build 0cfa025d6ffeee07cb0fa2125c977185b849e5c9)
      extra:                writer.model.name = example
      
      file schema:          test
      --------------------------------------------------------------------------------
      int32_field_required: REQUIRED INT32 R:0 D:0
      int32_field_optional: OPTIONAL INT32 R:0 D:1
      int32_field_repeated: REPEATED INT32 R:1 D:1
      
      row group 1:          RC:10 TS:147 OFFSET:4
      --------------------------------------------------------------------------------
      int32_field_required:  INT32 GZIP DO:0 FPO:4 SZ:65/47/0.72 VC:10 ENC:DELTA_BINARY_PACKED
      int32_field_optional:  INT32 GZIP DO:0 FPO:69 SZ:67/49/0.73 VC:10 ENC:DELTA_BINARY_PACKED
      int32_field_repeated:  INT32 GZIP DO:0 FPO:136 SZ:69/51/0.74 VC:10 ENC:DELTA_BINARY_PACKED
      

      and dump

      [root@perfnode166 parquet-mr]# java -jar parquet-tools/target/parquet-tools-1.8.2-SNAPSHOT.jar dump /mapr/drill50.perf.lab/drill/testdata/parquet_storage/int32_10_bs10k_ps1k_gzip.parquet
      row group 0
      --------------------------------------------------------------------------------
      int32_field_required:  INT32 GZIP DO:0 FPO:4 SZ:65/47/0.72 VC:10 ENC:D [more]...
      int32_field_optional:  INT32 GZIP DO:0 FPO:69 SZ:67/49/0.73 VC:10 ENC: [more]...
      int32_field_repeated:  INT32 GZIP DO:0 FPO:136 SZ:69/51/0.74 VC:10 ENC [more]...
      
          int32_field_required TV=10 RL=0 DL=0
          ----------------------------------------------------------------------------
          page 0:  DLE:RLE RLE:RLE VLE:DELTA_BINARY_PACKED ST:[min: 0, max:  [more]... VC:10
      
          int32_field_optional TV=10 RL=0 DL=1
          ----------------------------------------------------------------------------
          page 0:  DLE:RLE RLE:RLE VLE:DELTA_BINARY_PACKED ST:[min: 1, max:  [more]... VC:10
      
          int32_field_repeated TV=10 RL=1 DL=1
          ----------------------------------------------------------------------------
          page 0:  DLE:RLE RLE:RLE VLE:DELTA_BINARY_PACKED ST:[min: 2, max:  [more]... VC:10
      
      INT32 int32_field_required
      --------------------------------------------------------------------------------
      *** row group 1 of 1, values 1 to 10 ***
      value 1:  R:0 D:0 V:0
      value 2:  R:0 D:0 V:3
      value 3:  R:0 D:0 V:6
      value 4:  R:0 D:0 V:9
      value 5:  R:0 D:0 V:12
      value 6:  R:0 D:0 V:15
      value 7:  R:0 D:0 V:18
      value 8:  R:0 D:0 V:21
      value 9:  R:0 D:0 V:24
      value 10: R:0 D:0 V:27
      
      INT32 int32_field_optional
      --------------------------------------------------------------------------------
      *** row group 1 of 1, values 1 to 10 ***
      value 1:  R:0 D:1 V:1
      value 2:  R:0 D:1 V:4
      value 3:  R:0 D:1 V:7
      value 4:  R:0 D:1 V:10
      value 5:  R:0 D:1 V:13
      value 6:  R:0 D:1 V:16
      value 7:  R:0 D:1 V:19
      value 8:  R:0 D:1 V:22
      value 9:  R:0 D:1 V:25
      value 10: R:0 D:1 V:28
      
      INT32 int32_field_repeated
      --------------------------------------------------------------------------------
      *** row group 1 of 1, values 1 to 10 ***
      value 1:  R:0 D:1 V:2
      value 2:  R:0 D:1 V:5
      value 3:  R:0 D:1 V:8
      value 4:  R:0 D:1 V:11
      value 5:  R:0 D:1 V:14
      value 6:  R:0 D:1 V:17
      value 7:  R:0 D:1 V:20
      value 8:  R:0 D:1 V:23
      value 9:  R:0 D:1 V:26
      value 10: R:0 D:1 V:29
      

      But query through drill, I got the following error:

      0: jdbc:drill:schema=dfs.drillTestDir> select * from dfs.`drill/testdata/parquet_storage/int32_10_bs10k_ps1k_gzip.parquet`;
      Error: SYSTEM ERROR: IllegalArgumentException
      
      Fragment 0:0
      
      [Error Id: d91ec9fe-0ce3-4d05-9e5b-d53cebb99726 on 10.10.30.169:31010] (state=,code=0)
      
      0: jdbc:drill:schema=dfs.drillTestDir> select * from sys.version;
      +-----------------+-------------------------------------------+---------------------------------------------------------------------------+----------------------------+---------------------+----------------------------+
      |     version     |                 commit_id                 |                              commit_message                               |        commit_time         |     build_email     |         build_time         |
      +-----------------+-------------------------------------------+---------------------------------------------------------------------------+----------------------------+---------------------+----------------------------+
      | 1.7.0-SNAPSHOT  | 1c9e92b0cec18b4ee5a005fd6006ad329e3fa568  | DRILL-4574: Avro Plugin: Flatten does not work correctly on record items  | 24.06.2016 @ 15:07:25 PDT  | inramana@gmail.com  | 27.06.2016 @ 10:38:46 PDT  |
      +-----------------+-------------------------------------------+---------------------------------------------------------------------------+----------------------------+---------------------+----------------------------+
      

      drillbit.log:

      2016-07-06 16:21:14,139 [28826d94-a4bb-325d-6475-d440a1c78da0:foreman] INFO  o.a.drill.exec.work.foreman.Foreman - Query text for query id 28826d94-a4bb-325d-6475-d440a1c78da0: select * from dfs.`drill/testdata/parquet_storage/int32_10_bs10k_ps1k_gzip.parquet`
      2016-07-06 16:21:14,395 [28826d94-a4bb-325d-6475-d440a1c78da0:foreman] INFO  o.a.d.exec.store.parquet.Metadata - Took 0 ms to get file statuses
      2016-07-06 16:21:14,398 [28826d94-a4bb-325d-6475-d440a1c78da0:foreman] INFO  o.a.d.exec.store.parquet.Metadata - Fetch parquet metadata: Executed 1 out of 1 using 1 threads. Time: 2ms total, 2.513895ms avg, 2ms max.
      2016-07-06 16:21:14,398 [28826d94-a4bb-325d-6475-d440a1c78da0:foreman] INFO  o.a.d.exec.store.parquet.Metadata - Fetch parquet metadata: Executed 1 out of 1 using 1 threads. Earliest start: 0.907000 μs, Latest start: 0.907000 μs, Average start: 0.907000 μs .
      2016-07-06 16:21:14,399 [28826d94-a4bb-325d-6475-d440a1c78da0:foreman] INFO  o.a.d.exec.store.parquet.Metadata - Took 2 ms to read file metadata
      2016-07-06 16:21:14,518 [28826d94-a4bb-325d-6475-d440a1c78da0:frag:0:0] INFO  o.a.d.e.w.fragment.FragmentExecutor - 28826d94-a4bb-325d-6475-d440a1c78da0:0:0: State change requested AWAITING_ALLOCATION --> FAILED
      2016-07-06 16:21:14,519 [28826d94-a4bb-325d-6475-d440a1c78da0:frag:0:0] INFO  o.a.d.e.w.fragment.FragmentExecutor - 28826d94-a4bb-325d-6475-d440a1c78da0:0:0: State change requested FAILED --> FAILED
      2016-07-06 16:21:14,519 [28826d94-a4bb-325d-6475-d440a1c78da0:frag:0:0] INFO  o.a.d.e.w.fragment.FragmentExecutor - 28826d94-a4bb-325d-6475-d440a1c78da0:0:0: State change requested FAILED --> FAILED
      2016-07-06 16:21:14,519 [28826d94-a4bb-325d-6475-d440a1c78da0:frag:0:0] INFO  o.a.d.e.w.fragment.FragmentExecutor - 28826d94-a4bb-325d-6475-d440a1c78da0:0:0: State change requested FAILED --> FINISHED
      2016-07-06 16:21:14,529 [28826d94-a4bb-325d-6475-d440a1c78da0:frag:0:0] ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: IllegalArgumentException
      
      Fragment 0:0
      
      [Error Id: d91ec9fe-0ce3-4d05-9e5b-d53cebb99726 on 10.10.30.169:31010]
      org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: IllegalArgumentException
      
      Fragment 0:0
      
      [Error Id: d91ec9fe-0ce3-4d05-9e5b-d53cebb99726 on 10.10.30.169:31010]
      	at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543) ~[drill-common-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
      	at org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:318) [drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
      	at org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:185) [drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
      	at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:287) [drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
      	at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) [drill-common-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_79]
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_79]
      	at java.lang.Thread.run(Thread.java:745) [na:1.7.0_79]
      Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: Error in drill parquet reader (complex).
      Message: Failure in setting up reader
      Parquet Metadata: ParquetMetaData{FileMetaData{schema: message test {
        required int32 int32_field_required;
        optional int32 int32_field_optional;
        repeated int32 int32_field_repeated;
      }
      , metadata: {writer.model.name=example}}, blocks: [BlockMetaData{10, 147 [ColumnMetaData{GZIP [int32_field_required] INT32  [DELTA_BINARY_PACKED], 4}, ColumnMetaData{GZIP [int32_field_optional] INT32  [DELTA_BINARY_PACKED], 69}, ColumnMetaData{GZIP [int32_field_repeated] INT32  [DELTA_BINARY_PACKED], 136}]}]}
      	at org.apache.drill.exec.store.parquet2.DrillParquetReader.handleAndRaise(DrillParquetReader.java:279) ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
      	at org.apache.drill.exec.store.parquet2.DrillParquetReader.setup(DrillParquetReader.java:271) ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
      	at org.apache.drill.exec.physical.impl.ScanBatch.<init>(ScanBatch.java:101) ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
      	at org.apache.drill.exec.store.parquet.ParquetScanBatchCreator.getBatch(ParquetScanBatchCreator.java:140) ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
      	at org.apache.drill.exec.store.parquet.ParquetScanBatchCreator.getBatch(ParquetScanBatchCreator.java:53) ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
      	at org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:148) ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
      	at org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:171) ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
      	at org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:128) ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
      	at org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:171) ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
      	at org.apache.drill.exec.physical.impl.ImplCreator.getRootExec(ImplCreator.java:101) ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
      	at org.apache.drill.exec.physical.impl.ImplCreator.getExec(ImplCreator.java:79) ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
      	at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:231) [drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
      	... 4 common frames omitted
      Caused by: java.lang.IllegalArgumentException: null
      	at java.nio.Buffer.limit(Buffer.java:267) ~[na:1.7.0_79]
      	at org.apache.parquet.bytes.BytesInput$ByteBufferBytesInput.toByteBuffer(BytesInput.java:438) ~[parquet-encoding-1.8.1-drill-r0.jar:1.8.1-drill-r0]
      	at org.apache.parquet.column.impl.ColumnReaderImpl.readPageV2(ColumnReaderImpl.java:612) ~[parquet-column-1.8.1-drill-r0.jar:1.8.1-drill-r0]
      	at org.apache.parquet.column.impl.ColumnReaderImpl.access$400(ColumnReaderImpl.java:61) ~[parquet-column-1.8.1-drill-r0.jar:1.8.1-drill-r0]
      	at org.apache.parquet.column.impl.ColumnReaderImpl$3.visit(ColumnReaderImpl.java:546) ~[parquet-column-1.8.1-drill-r0.jar:1.8.1-drill-r0]
      	at org.apache.parquet.column.impl.ColumnReaderImpl$3.visit(ColumnReaderImpl.java:538) ~[parquet-column-1.8.1-drill-r0.jar:1.8.1-drill-r0]
      	at org.apache.parquet.column.page.DataPageV2.accept(DataPageV2.java:141) ~[parquet-column-1.8.1-drill-r0.jar:1.8.1-drill-r0]
      	at org.apache.parquet.column.impl.ColumnReaderImpl.readPage(ColumnReaderImpl.java:538) ~[parquet-column-1.8.1-drill-r0.jar:1.8.1-drill-r0]
      	at org.apache.parquet.column.impl.ColumnReaderImpl.checkRead(ColumnReaderImpl.java:530) ~[parquet-column-1.8.1-drill-r0.jar:1.8.1-drill-r0]
      	at org.apache.parquet.column.impl.ColumnReaderImpl.consume(ColumnReaderImpl.java:642) ~[parquet-column-1.8.1-drill-r0.jar:1.8.1-drill-r0]
      	at org.apache.parquet.column.impl.ColumnReaderImpl.<init>(ColumnReaderImpl.java:358) ~[parquet-column-1.8.1-drill-r0.jar:1.8.1-drill-r0]
      	at org.apache.parquet.column.impl.ColumnReadStoreImpl.newMemColumnReader(ColumnReadStoreImpl.java:82) ~[parquet-column-1.8.1-drill-r0.jar:1.8.1-drill-r0]
      	at org.apache.parquet.column.impl.ColumnReadStoreImpl.getColumnReader(ColumnReadStoreImpl.java:77) ~[parquet-column-1.8.1-drill-r0.jar:1.8.1-drill-r0]
      	at org.apache.parquet.io.RecordReaderImplementation.<init>(RecordReaderImplementation.java:270) ~[parquet-column-1.8.1-drill-r0.jar:1.8.1-drill-r0]
      	at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:140) ~[parquet-column-1.8.1-drill-r0.jar:1.8.1-drill-r0]
      	at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:106) ~[parquet-column-1.8.1-drill-r0.jar:1.8.1-drill-r0]
      	at org.apache.parquet.filter2.compat.FilterCompat$NoOpFilter.accept(FilterCompat.java:154) ~[parquet-column-1.8.1-drill-r0.jar:1.8.1-drill-r0]
      	at org.apache.parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:106) ~[parquet-column-1.8.1-drill-r0.jar:1.8.1-drill-r0]
      	at org.apache.parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:82) ~[parquet-column-1.8.1-drill-r0.jar:1.8.1-drill-r0]
      	at org.apache.drill.exec.store.parquet2.DrillParquetReader.setup(DrillParquetReader.java:268) ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
      	... 14 common frames omitted
      2016-07-06 16:21:14,585 [CONTROL-rpc-event-queue] WARN  o.a.drill.exec.work.foreman.Foreman - Dropping request to move to COMPLETED state as query is already at FAILED state (which is terminal).
      2016-07-06 16:21:14,590 [CONTROL-rpc-event-queue] WARN  o.a.d.e.w.b.ControlMessageHandler - Dropping request to cancel fragment. 28826d94-a4bb-325d-6475-d440a1c78da0:0:0 does not exist.
      

        Attachments

          Activity

            People

            • Assignee:
              ppenumarthy Padma Penumarthy
              Reporter:
              cchang@maprtech.com Chun Chang
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: