Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.4.0, 1.5.0, 1.6.0
-
None
Description
When I generate file using MapReduce and parquet 1.8.1 (or 1.8.1-drill-r0), which contains REQUIRED INT64 field, I'm not able to read this column in drill, but I'm able to read full content using parquet-tools cat/dump. This doesn't happened every time, it is input data dependant (so probably different encoding is chosen by parquet for given column?).
Error reported by drill:
2016-03-02 03:01:16,354 [29296305-abe2-f4bd-ded0-27bb53f631f0:frag:3:0] ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: IllegalArgumentException: Reading past RLE/BitPacking stream. Fragment 3:0 [Error Id: e2d02152-1b67-4c9f-9cb1-bd2b9ff302d8 on drssc9a4:31010] org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: IllegalArgumentException: Reading past RLE/BitPacking stream. Fragment 3:0 [Error Id: e2d02152-1b67-4c9f-9cb1-bd2b9ff302d8 on drssc9a4:31010] at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534) ~[drill-common-1.4.0.jar:1.4.0] at org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:321) [drill-java-exec-1.4.0.jar:1.4.0] at org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:184) [drill-java-exec-1.4.0.jar:1.4.0] at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:290) [drill-java-exec-1.4.0.jar:1.4.0] at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) [drill-common-1.4.0.jar:1.4.0] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_40] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_40] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40] Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: Error in parquet record reader. Message: Hadoop path: /tmp/tmp.gz.parquet Total records read: 131070 Mock records read: 0 Records to read: 21845 Row group index: 0 Records in row group: 2418197 Parquet Metadata: ParquetMetaData{FileMetaData{schema: message nat { required int64 ts; required int32 dr; optional binary ui (UTF8); optional int32 up; optional binary ri (UTF8); optional int32 rp; optional binary di (UTF8); optional int32 dp; required int32 pr; optional int64 ob; optional int64 ib; } , metadata: {}}, blocks: [BlockMetaData{2418197, 30601003 [ColumnMetaData{GZIP [ts] INT64 [PLAIN_DICTIONARY, BIT_PACKED, PLAIN], 4}, ColumnMetaData{GZIP [dr] INT32 [PLAIN_DICTIONARY, BIT_PACKED], 2630991}, ColumnMetaData{GZIP [ui] BINARY [PLAIN_DICTIONARY, RLE, BIT_PACKED], 2964867}, ColumnMetaData{GZIP [up] INT32 [PLAIN_DICTIONARY, RLE, BIT_PACKED], 2966955}, ColumnMetaData{GZIP [ri] BINARY [PLAIN_DICTIONARY, RLE, BIT_PACKED], 7481618}, ColumnMetaData{GZIP [rp] INT32 [PLAIN_DICTIONARY, RLE, BIT_PACKED], 7483706}, ColumnMetaData{GZIP [di] BINARY [RLE, BIT_PACKED, PLAIN], 11995191}, ColumnMetaData{GZIP [dp] INT32 [RLE, BIT_PACKED, PLAIN], 11995247}, ColumnMetaData{GZIP [pr] INT32 [PLAIN_DICTIONARY, BIT_PACKED], 11995303}, ColumnMetaData{GZIP [ob] INT64 [PLAIN_DICTIONARY, RLE, BIT_PACKED], 11995930}, ColumnMetaData{GZIP [ib] INT64 [PLAIN_DICTIONARY, RLE, BIT_PACKED], 11999527}]}]} at org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.handleAndRaise(ParquetRecordReader.java:345) ~[drill-java-exec-1.4.0.jar:1.4.0] at org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next(ParquetRecordReader.java:447) ~[drill-java-exec-1.4.0.jar:1.4.0] at org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:191) ~[drill-java-exec-1.4.0.jar:1.4.0] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) ~[drill-java-exec-1.4.0.jar:1.4.0] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) ~[drill-java-exec-1.4.0.jar:1.4.0] at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) ~[drill-java-exec-1.4.0.jar:1.4.0] at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:132) ~[drill-java-exec-1.4.0.jar:1.4.0] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162) ~[drill-java-exec-1.4.0.jar:1.4.0] at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104) ~[drill-java-exec-1.4.0.jar:1.4.0] at org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext(SingleSenderCreator.java:93) ~[drill-java-exec-1.4.0.jar:1.4.0] at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94) ~[drill-java-exec-1.4.0.jar:1.4.0] at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:256) ~[drill-java-exec-1.4.0.jar:1.4.0] at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:250) ~[drill-java-exec-1.4.0.jar:1.4.0] at java.security.AccessController.doPrivileged(Native Method) ~[na:1.8.0_40] at javax.security.auth.Subject.doAs(Subject.java:422) ~[na:1.8.0_40] at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) ~[hadoop-common-2.7.1.jar:na] at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:250) [drill-java-exec-1.4.0.jar:1.4.0] ... 4 common frames omitted Caused by: java.lang.IllegalArgumentException: Reading past RLE/BitPacking stream. at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:55) ~[parquet-common-1.8.1-drill-r0.jar:1.8.1-drill-r0] at org.apache.parquet.column.values.rle.RunLengthBitPackingHybridDecoder.readNext(RunLengthBitPackingHybridDecoder.java:84) ~[parquet-column-1.8.1-drill-r0.jar:1.8.1-drill-r0] at org.apache.parquet.column.values.rle.RunLengthBitPackingHybridDecoder.readInt(RunLengthBitPackingHybridDecoder.java:66) ~[parquet-column-1.8.1-drill-r0.jar:1.8.1-drill-r0] at org.apache.parquet.column.values.dictionary.DictionaryValuesReader.readLong(DictionaryValuesReader.java:122) ~[parquet-column-1.8.1-drill-r0.jar:1.8.1-drill-r0] at org.apache.drill.exec.store.parquet.columnreaders.ParquetFixedWidthDictionaryReaders$DictionaryBigIntReader.readField(ParquetFixedWidthDictionaryReaders.java:182) ~[drill-java-exec-1.4.0.jar:1.4.0] at org.apache.drill.exec.store.parquet.columnreaders.ColumnReader.readValues(ColumnReader.java:120) ~[drill-java-exec-1.4.0.jar:1.4.0] at org.apache.drill.exec.store.parquet.columnreaders.ColumnReader.processPageData(ColumnReader.java:169) ~[drill-java-exec-1.4.0.jar:1.4.0] at org.apache.drill.exec.store.parquet.columnreaders.ColumnReader.determineSize(ColumnReader.java:146) ~[drill-java-exec-1.4.0.jar:1.4.0] at org.apache.drill.exec.store.parquet.columnreaders.ColumnReader.processPages(ColumnReader.java:107) ~[drill-java-exec-1.4.0.jar:1.4.0] at org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.readAllFixedFields(ParquetRecordReader.java:386) ~[drill-java-exec-1.4.0.jar:1.4.0] at org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next(ParquetRecordReader.java:429) ~[drill-java-exec-1.4.0.jar:1.4.0] ... 19 common frames omitted
When I change fields in schema to optional and regenerate file, drill will start working. Same when I generate file using CTAS (which have all columns optional as well).