Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
CentOS release 6.5
Description
=================================================
DRILL WRITING A PARQUET TABLE WITH NESTED DATA
=================================================
I have a JSON file with nested data (schema present below):
I am able to read this JSON file successfully from drill and access nested values. However when I try to import this data and create a table in PARQUET format, it errors:
QUERY: create table test as select * from `/user/root/sample-data/nested_student.json`;
ERROR: Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while running query.[error_id: "3ce3dc1e-d920-4262-ae2d-28bd2d034597"
endpoint {
address: "perfnode154.perf.lab"
user_port: 31010
control_port: 31011
data_port: 31012
}
error_type: 0
message: "Failure while running fragment. < ParquetEncodingException:[ error starting field interests at 6 ] < ClassCastException:[ parquet.io.PrimitiveColumnIO cannot be cast to parquet.io.GroupColumnIO ]"
]
Error: exception while executing query (state=,code=0)
2014-06-24 00:41:18,646 [b10db58d-8d4d-4d02-9fb5-a5081e5cb254:frag:0:0] ERROR o.a.d.e.w.f.AbstractStatusReporter - Error 48602de2-8306-47d2-875f-8ad2cd2e964a: Failure while running fragment. java.lang.ClassCastException: parquet.io.PrimitiveColumnIO cannot be cast to parquet.io.GroupColumnIO at parquet.io.MessageColumnIO$MessageColumnIORecordConsumer.startField(MessageColumnIO.java:171) ~[parquet-column-1.5.0-20140513.004024-1.jar:na] at org.apache.drill.exec.store.ParquetOutputRecordWriter.addRepeatedVarCharHolder(ParquetOutputRecordWriter.java:761) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.store.EventBasedRecordWriter$RepeatedVarCharFieldWriter.writeField(EventBasedRecordWriter.java:1156) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.store.EventBasedRecordWriter.write(EventBasedRecordWriter.java:150) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext(WriterRecordBatch.java:111) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:91) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:72) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:65) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:45) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:94) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:91) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:56) ~[drill-java-exec-1.0.0-m2-incubat ing-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:85) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:46) ~[drill-java-exec-1.0.0-m2-incubat ing-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:100) ~[drill-java-exec-1.0.0-m2 -incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
=================================================
DRILL READING A PARQUET TABLE WITH NESTED DATA
=================================================
I generated a parquet file by reading the below Json file into pig and storing it in a parquet format:
{"recipe":"Tacos","ingredients":[
,
{"name":"Lettuce"},
{"name":"Cheese"}],"inventor":{"name":"Alex","age":25}}
{"recipe":"TomatoSoup","ingredients":[
,
{"name":"Milk"}],"inventor":{"name":"Steve","age":23}}
When I try to read this parquet table in Drill, it errors:
QUERY: Select * from `/user/root/complex.parquet`;
ERROR: Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while running query.[error_id: "c2e735f4-e11c-4e10-a410-959b3880dce0"
endpoint {
address: "perfnode154.perf.lab"
user_port: 31010
control_port: 31011
data_port: 31012
}
error_type: 0
message: "Failure while running fragment. < UnsupportedOperationException:[ unsupported type: BINARY LIST ]"
]
Error: exception while executing query (state=,code=0)
2014-07-23 22:16:45,239 [d106ad59-595f-42e7-880a-ef9f6bff1ff0:frag:0:0] DEBUG o.a.d.e.w.fragment.FragmentExecutor - Failure while initializing operator tree java.lang.UnsupportedOperationException: unsupported type: BINARY LIST at org.apache.drill.exec.store.parquet.ParquetRecordReader.toMajorType(ParquetRecordReader.java:446) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.store.parquet.ParquetRecordReader.setup(ParquetRecordReader.java:219) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.impl.ScanBatch.<init>(ScanBatch.java:93) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.store.parquet.ParquetScanBatchCreator.getBatch(ParquetScanBatchCreator.java:126) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.store.parquet.ParquetScanBatchCreator.getBatch(ParquetScanBatchCreator.java:47) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.impl.ImplCreator.visitOp(ImplCreator.java:62) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.impl.ImplCreator.visitOp(ImplCreator.java:39) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.base.AbstractPhysicalVisitor.visitSubScan(AbstractPhysicalVisitor.java:113) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.store.parquet.ParquetRowGroupScan.accept(ParquetRowGroupScan.java:113) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:74) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.impl.ImplCreator.visitOp(ImplCreator.java:62) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.impl.ImplCreator.visitOp(ImplCreator.java:39) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.base.AbstractPhysicalVisitor.visitIteratorValidator(AbstractPhysicalVisitor.java:196) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.config.IteratorValidator.accept(IteratorValidator.java:34) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:74) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.impl.ImplCreator.visitOp(ImplCreator.java:62) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.impl.ImplCreator.visitOp(ImplCreator.java:39) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.base.AbstractPhysicalVisitor.visitProducerConsumer(AbstractPhysicalVisitor.java:191) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.config.ProducerConsumer.accept(ProducerConsumer.java:42) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:74) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.impl.ImplCreator.visitOp(ImplCreator.java:62) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.impl.ImplCreator.visitOp(ImplCreator.java:39) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.base.AbstractPhysicalVisitor.visitIteratorValidator(AbstractPhysicalVisitor.java:196) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.config.IteratorValidator.accept(IteratorValidator.java:34) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:74) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.impl.ImplCreator.visitOp(ImplCreator.java:59) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.impl.ImplCreator.visitOp(ImplCreator.java:39) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.base.AbstractPhysicalVisitor.visitStore(AbstractPhysicalVisitor.java:118) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.base.AbstractPhysicalVisitor.visitScreen(AbstractPhysicalVisitor.java:176) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.config.Screen.accept(Screen.java:95) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.impl.ImplCreator.getExec(ImplCreator.java:87) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:81) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:242) [drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_60] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_60] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_60]
I am able to verify that it has repeated data by dumping the parquet file using parquet-tools
./parquet-tools dump badpigparquet row group 0 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- recipe: BINARY UNCOMPRESSED DO:0 FPO:4 SZ:85/85/1.00 VC:6 ENC:BIT_PACKED,RLE,PLAIN_DICTIONARY ingredients: .bag: ..name: BINARY UNCOMPRESSED DO:0 FPO:89 SZ:120/120/1.00 VC:15 ENC:RLE,PLAIN_DICTIONARY inventor: .name: BINARY UNCOMPRESSED DO:0 FPO:209 SZ:74/74/1.00 VC:6 ENC:BIT_PACKED,RLE,PLAIN_DICTIONARY .age: INT32 UNCOMPRESSED DO:0 FPO:283 SZ:64/64/1.00 VC:6 ENC:BIT_PACKED,RLE,PLAIN_DICTIONARY recipe TV=6 RL=0 DL=1 DS: 2 DE:PLAIN_DICTIONARY ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- page 0: DLE:RLE RLE:BIT_PACKED VLE:PLAIN_DICTIONARY SZ:9 VC:6 ingredients.bag.name TV=15 RL=1 DL=3 DS: 5 DE:PLAIN_DICTIONARY ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- page 0: DLE:RLE RLE:RLE VLE:PLAIN_DICTIONARY SZ:21 VC:15 inventor.name TV=6 RL=0 DL=2 DS: 2 DE:PLAIN_DICTIONARY ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- page 0: DLE:RLE RLE:BIT_PACKED VLE:PLAIN_DICTIONARY SZ:10 VC:6 inventor.age TV=6 RL=0 DL=2 DS: 2 DE:PLAIN_DICTIONARY ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- page 0: DLE:RLE RLE:BIT_PACKED VLE:PLAIN_DICTIONARY SZ:10 VC:6 BINARY recipe ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- *** row group 1 of 1, values 1 to 6 *** value 1: R:0 D:1 V:Tacos value 2: R:0 D:1 V:TomatoSoup value 3: R:0 D:1 V:Tacos value 4: R:0 D:1 V:TomatoSoup value 5: R:0 D:1 V:Tacos value 6: R:0 D:1 V:TomatoSoup BINARY ingredients.bag.name ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- *** row group 1 of 1, values 1 to 15 *** value 1: R:0 D:3 V:Beef value 2: R:1 D:3 V:Lettuce value 3: R:1 D:3 V:Cheese value 4: R:0 D:3 V:Tomatoes value 5: R:1 D:3 V:Milk value 6: R:0 D:3 V:Beef value 7: R:1 D:3 V:Lettuce value 8: R:1 D:3 V:Cheese value 9: R:0 D:3 V:Tomatoes value 10: R:1 D:3 V:Milk value 11: R:0 D:3 V:Beef value 12: R:1 D:3 V:Lettuce value 13: R:1 D:3 V:Cheese value 14: R:0 D:3 V:Tomatoes value 15: R:1 D:3 V:Milk BINARY inventor.name ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- *** row group 1 of 1, values 1 to 6 *** value 1: R:0 D:2 V:Alex value 2: R:0 D:2 V:Steve value 3: R:0 D:2 V:Alex value 4: R:0 D:2 V:Steve value 5: R:0 D:2 V:Alex value 6: R:0 D:2 V:Steve INT32 inventor.age ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- *** row group 1 of 1, values 1 to 6 *** value 1: R:0 D:2 V:25 value 2: R:0 D:2 V:23 value 3: R:0 D:2 V:25 value 4: R:0 D:2 V:23 value 5: R:0 D:2 V:25 value 6: R:0 D:2 V:23