Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-2181

parquet-cli fails at supporting parquet-protobuf generated files

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • None
    • None
    • parquet-cli
    • None

    Description

      i generated a parquet file using a protobuf with this proto definition:

      message IndexPath {
        // Index of item in path.
        repeated int32 index = 1;
      }
      
      message SomeEvent {
        // truncated/obfuscated wrapper
        optional IndexPath client_position = 1;
      }
      

      this gets translated to the following parquet schema using the new compliant schema for lists:

      message SomeEvent {
        optional group client_position = 1 {
          optional group index (LIST) = 1 {
            repeated group list {
              required int32 element;
            }
          }
        }
      }
      

      this causes parquet-cli cat to barf on a file containing these events:

      java.lang.RuntimeException: Failed on record 0
              at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:86)
              at org.apache.parquet.cli.Main.run(Main.java:157)
              at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
              at org.apache.parquet.cli.Main.main(Main.java:187)
      Caused by: java.lang.ClassCastException: required int32 element is not a group
              at org.apache.parquet.schema.Type.asGroupType(Type.java:248)
              at org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:284)
              at org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:228)
              at org.apache.parquet.avro.AvroRecordConverter.access$100(AvroRecordConverter.java:74)
              at org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter$ElementConverter.<init>(AvroRecordConverter.java:539)
              at org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter.<init>(AvroRecordConverter.java:489)
              at org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:293)
              at org.apache.parquet.avro.AvroRecordConverter.<init>(AvroRecordConverter.java:137)
              at org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:284)
              at org.apache.parquet.avro.AvroRecordConverter.<init>(AvroRecordConverter.java:137)
              at org.apache.parquet.avro.AvroRecordConverter.<init>(AvroRecordConverter.java:91)
              at org.apache.parquet.avro.AvroRecordMaterializer.<init>(AvroRecordMaterializer.java:33)
              at org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:142)
              at org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:190)
              at org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:166)
              at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135)
              at org.apache.parquet.cli.BaseCommand$1$1.advance(BaseCommand.java:363)
              at org.apache.parquet.cli.BaseCommand$1$1.<init>(BaseCommand.java:344)
              at org.apache.parquet.cli.BaseCommand$1.iterator(BaseCommand.java:342)
              at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:73)
              ... 3 more

      using the old parquet-tools binary to cat this file works fine.

      Attachments

        1. sample-depth-1.tgz
          357 kB
          J Y
        2. samples.tgz
          793 kB
          J Y

        Activity

          People

            Unassigned Unassigned
            jinyius J Y
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: