Details
-
Bug
-
Status: Open
-
Critical
-
Resolution: Unresolved
-
None
-
None
-
None
Description
i generated a parquet file using a protobuf with this proto definition:
message IndexPath { // Index of item in path. repeated int32 index = 1; } message SomeEvent { // truncated/obfuscated wrapper optional IndexPath client_position = 1; }
this gets translated to the following parquet schema using the new compliant schema for lists:
message SomeEvent { optional group client_position = 1 { optional group index (LIST) = 1 { repeated group list { required int32 element; } } } }
this causes parquet-cli cat to barf on a file containing these events:
java.lang.RuntimeException: Failed on record 0
at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:86)
at org.apache.parquet.cli.Main.run(Main.java:157)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.parquet.cli.Main.main(Main.java:187)
Caused by: java.lang.ClassCastException: required int32 element is not a group
at org.apache.parquet.schema.Type.asGroupType(Type.java:248)
at org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:284)
at org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:228)
at org.apache.parquet.avro.AvroRecordConverter.access$100(AvroRecordConverter.java:74)
at org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter$ElementConverter.<init>(AvroRecordConverter.java:539)
at org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter.<init>(AvroRecordConverter.java:489)
at org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:293)
at org.apache.parquet.avro.AvroRecordConverter.<init>(AvroRecordConverter.java:137)
at org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:284)
at org.apache.parquet.avro.AvroRecordConverter.<init>(AvroRecordConverter.java:137)
at org.apache.parquet.avro.AvroRecordConverter.<init>(AvroRecordConverter.java:91)
at org.apache.parquet.avro.AvroRecordMaterializer.<init>(AvroRecordMaterializer.java:33)
at org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:142)
at org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:190)
at org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:166)
at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135)
at org.apache.parquet.cli.BaseCommand$1$1.advance(BaseCommand.java:363)
at org.apache.parquet.cli.BaseCommand$1$1.<init>(BaseCommand.java:344)
at org.apache.parquet.cli.BaseCommand$1.iterator(BaseCommand.java:342)
at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:73)
... 3 more
using the old parquet-tools binary to cat this file works fine.