Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-3871

Off by one error while reading binary fields with one terminal null in parquet

    XMLWordPrintableJSON

Details

    Description

      Both tables in the join where created by impala, with column c_timestamp being parquet int96.

      0: jdbc:drill:schema=dfs> select
      . . . . . . . . . . . . >         max(t1.c_timestamp),
      . . . . . . . . . . . . >         min(t1.c_timestamp),
      . . . . . . . . . . . . >         count(t1.c_timestamp)
      . . . . . . . . . . . . > from
      . . . . . . . . . . . . >         imp_t1 t1
      . . . . . . . . . . . . >                 inner join
      . . . . . . . . . . . . >         imp_t2 t2
      . . . . . . . . . . . . > on      (t1.c_timestamp = t2.c_timestamp)
      . . . . . . . . . . . . > ;
      java.lang.RuntimeException: java.sql.SQLException: SYSTEM ERROR: TProtocolException: Required field 'uncompressed_page_size' was not found in serialized data! Struct: PageHeader(type:null, uncompressed_page_size:0, compressed_page_size:0)
      
      Fragment 0:0
      
      [Error Id: eb6a5df8-fc59-409b-957a-59cb1079b5b8 on atsqa4-133.qa.lab:31010]
              at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73)
              at sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87)
              at sqlline.TableOutputFormat.print(TableOutputFormat.java:118)
              at sqlline.SqlLine.print(SqlLine.java:1583)
              at sqlline.Commands.execute(Commands.java:852)
              at sqlline.Commands.sql(Commands.java:751)
              at sqlline.SqlLine.dispatch(SqlLine.java:738)
              at sqlline.SqlLine.begin(SqlLine.java:612)
              at sqlline.SqlLine.start(SqlLine.java:366)
              at sqlline.SqlLine.main(SqlLine.java:259)
      

      drillbit.log

      2015-09-30 21:15:45,710 [29f3aefe-3209-a6e6-0418-500dac60a339:foreman] INFO  o.a.d.exec.store.parquet.Metadata - Took 0 ms to get file statuses
      2015-09-30 21:15:45,712 [29f3aefe-3209-a6e6-0418-500dac60a339:foreman] INFO  o.a.d.exec.store.parquet.Metadata - Fetch parquet metadata: Executed 1 out of 1 using 1 threads. Time: 1ms total, 1.645381ms avg, 1ms max.
      2015-09-30 21:15:45,712 [29f3aefe-3209-a6e6-0418-500dac60a339:foreman] INFO  o.a.d.exec.store.parquet.Metadata - Fetch parquet metadata: Executed 1 out of 1 using 1 threads. Earliest start: 1.332000 μs, Latest start: 1.332000 μs, Average start: 1.332000 μs .
      2015-09-30 21:15:45,830 [29f3aefe-3209-a6e6-0418-500dac60a339:frag:0:0] INFO  o.a.d.e.w.fragment.FragmentExecutor - 29f3aefe-3209-a6e6-0418-500dac60a339:0:0: State change requested AWAITING_ALLOCATION --> RUNNING
      2015-09-30 21:15:45,830 [29f3aefe-3209-a6e6-0418-500dac60a339:frag:0:0] INFO  o.a.d.e.w.f.FragmentStatusReporter - 29f3aefe-3209-a6e6-0418-500dac60a339:0:0: State to report: RUNNING
      2015-09-30 21:15:45,925 [29f3aefe-3209-a6e6-0418-500dac60a339:frag:0:0] INFO  o.a.d.e.w.fragment.FragmentExecutor - 29f3aefe-3209-a6e6-0418-500dac60a339:0:0: State change requested RUNNING --> FAILED
      2015-09-30 21:15:45,930 [29f3aefe-3209-a6e6-0418-500dac60a339:frag:0:0] INFO  o.a.d.e.w.fragment.FragmentExecutor - 29f3aefe-3209-a6e6-0418-500dac60a339:0:0: State change requested FAILED --> FINISHED
      2015-09-30 21:15:45,931 [29f3aefe-3209-a6e6-0418-500dac60a339:frag:0:0] ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: TProtocolException: Required field 'uncompressed_page_size' was not found in serialized data! Struct: PageHeader(type:null, uncompressed_page_size:0, compressed_page_size:0)
      
      Fragment 0:0
      
      [Error Id: eb6a5df8-fc59-409b-957a-59cb1079b5b8 on atsqa4-133.qa.lab:31010]
      org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: TProtocolException: Required field 'uncompressed_page_size' was not found in serialized data! Struct: PageHeader(type:null, uncompressed_page_size:0, compressed_page_size:0)
      
      Fragment 0:0
      
      [Error Id: eb6a5df8-fc59-409b-957a-59cb1079b5b8 on atsqa4-133.qa.lab:31010]
              at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534) ~[drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
              at org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:323) [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
              at org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:178) [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
              at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:292) [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
              at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) [drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_71]
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_71]
              at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
      Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: Error in parquet record reader.
      Message:
      Hadoop path: /drill/testdata/subqueries/imp_t2/bf4261140dac8d45-814d66b86bf960b8_853027779_data.0.parq
      Total records read: 10
      Mock records read: 0
      Records to read: 1
      Row group index: 0
      Records in row group: 10
      Parquet Metadata: ParquetMetaData{FileMetaData{schema: message schema {
        optional binary c_varchar (UTF8);
        optional int32 c_integer;
        optional int64 c_bigint;
        optional float c_float;
        optional double c_double;
        optional binary c_date (UTF8);
        optional binary c_time (UTF8);
        optional int96 c_timestamp;
        optional boolean c_boolean;
        optional double d9;
        optional double d18;
        optional double d28;
        optional double d38;
      }
      , metadata: {}}, blocks: [BlockMetaData{10, 1507 [ColumnMetaData{SNAPPY [c_varchar] BINARY  [PLAIN, PLAIN_DICTIONARY, RLE], 173}, ColumnMetaData{SNAPPY [c_integer] INT32  [PLAIN, PLAIN_DICTIONARY, RLE], 299}, ColumnMetaData{SNAPPY [c_bigint] INT64  [PLAIN, PLAIN_DICTIONARY, RLE], 453}, ColumnMetaData{SNAPPY [c_float] FLOAT  [PLAIN, PLAIN_DICTIONARY, RLE], 581}, ColumnMetaData{SNAPPY [c_double] DOUBLE  [PLAIN, PLAIN_DICTIONARY, RLE], 747}, ColumnMetaData{SNAPPY [c_date] BINARY  [PLAIN, PLAIN_DICTIONARY, RLE], 900}, ColumnMetaData{SNAPPY [c_time] BINARY  [PLAIN, PLAIN_DICTIONARY, RLE], 1045}, ColumnMetaData{SNAPPY [c_timestamp] INT96  [PLAIN, PLAIN_DICTIONARY, RLE], 1213}, ColumnMetaData{SNAPPY [c_boolean] BOOLEAN  [PLAIN, PLAIN_DICTIONARY, RLE], 1293}, ColumnMetaData{SNAPPY [d9] DOUBLE  [PLAIN, PLAIN_DICTIONARY, RLE], 1448}, ColumnMetaData{SNAPPY [d18] DOUBLE  [PLAIN, PLAIN_DICTIONARY, RLE], 1609}, ColumnMetaData{SNAPPY [d28] DOUBLE  [PLAIN, PLAIN_DICTIONARY, RLE], 1771}, ColumnMetaData{SNAPPY [d38] DOUBLE  [PLAIN, PLAIN_DICTIONARY, RLE], 1933}]}]}
              at org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.handleAndRaise(ParquetRecordReader.java:346) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
              at org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next(ParquetRecordReader.java:448) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
              at org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:183) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
              at org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
              at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:104) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
              at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:94) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
              at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
              at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:129) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
              at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
              at org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
              at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:104) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
              at org.apache.drill.exec.physical.impl.join.HashJoinBatch.executeBuildPhase(HashJoinBatch.java:403) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
              at org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext(HashJoinBatch.java:218) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
              at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
              at org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
              at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:104) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
              at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:94) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
              at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
              at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:129) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
              at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
              at org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
              at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:104) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
              at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:94) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
              at org.apache.drill.exec.physical.impl.aggregate.StreamingAggBatch.innerNext(StreamingAggBatch.java:136) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
              at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
              at org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
              at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:104) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
              at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:94) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
              at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
              at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:129) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
              at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
              at org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
              at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:83) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
              at org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:80) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
              at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:73) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
              at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:258) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
              at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:252) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
              at java.security.AccessController.doPrivileged(Native Method) ~[na:1.7.0_71]
              at javax.security.auth.Subject.doAs(Subject.java:415) ~[na:1.7.0_71]
              at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566) ~[hadoop-common-2.5.1-mapr-1503.jar:na]
              at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:252) [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
              ... 4 common frames omitted
      Caused by: java.io.IOException: can not read class parquet.format.PageHeader: Required field 'uncompressed_page_size' was not found in serialized data! Struct: PageHeader(type:null, uncompressed_page_size:0, compressed_page_size:0)
              at parquet.format.Util.read(Util.java:50) ~[parquet-format-2.1.1-drill-r1.jar:na]
              at parquet.format.Util.readPageHeader(Util.java:26) ~[parquet-format-2.1.1-drill-r1.jar:na]
              at org.apache.drill.exec.store.parquet.ColumnDataReader.readPageHeader(ColumnDataReader.java:46) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
              at org.apache.drill.exec.store.parquet.columnreaders.PageReader.next(PageReader.java:191) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
              at org.apache.drill.exec.store.parquet.columnreaders.NullableColumnReader.processPages(NullableColumnReader.java:76) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
              at org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.readAllFixedFields(ParquetRecordReader.java:387) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
              at org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next(ParquetRecordReader.java:430) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
              ... 43 common frames omitted
      Caused by: parquet.org.apache.thrift.protocol.TProtocolException: Required field 'uncompressed_page_size' was not found in serialized data! Struct: PageHeader(type:null, uncompressed_page_size:0, compressed_page_size:0)
              at parquet.format.PageHeader.read(PageHeader.java:905) ~[parquet-format-2.1.1-drill-r1.jar:na]
              at parquet.format.Util.read(Util.java:47) ~[parquet-format-2.1.1-drill-r1.jar:na]
              ... 49 common frames omitted
      2015-09-30 21:15:45,951 [BitServer-4] WARN  o.a.drill.exec.work.foreman.Foreman - Dropping request to move to COMPLETED state as query is already at FAILED state (which is terminal).
      

      Attachments

        1. tables.tar
          609 kB
          Victoria Markman

        Activity

          People

            adeneche Abdel Hakim Deneche
            vicky Victoria Markman
            Victoria Markman Victoria Markman
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: