Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-1356

Error when closing writer - Statistics comparator mismatched

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Not A Bug
    • None
    • None
    • parquet-avro
    • None
    • Mac OS Sierra 10.12.6

      IntelliJ 2018.1.6

      sbt 0.1

      scala-sdk-2.12.4

      java jdk1.8.0_144

    Description

      Hi all

      After getting some error in my custom implementation, I was trying to run a test case copied from here and I surprisingly got the same.

      val schema = new Schema.Parser().parse("{\n \"type\": \"record\",\n \"name\": \"myrecord\",\n \"fields\": [ {\n \"name\": \"myarray\",\n \"type\": {\n \"type\": \"array\",\n \"items\": \"int\"\n }\n } ]\n}")
      val tmp = File.createTempFile(getClass().getSimpleName(), ".tmp");
      tmp.deleteOnExit();
      tmp.delete();
      val file = new Path(tmp.getPath());
      val testConf = new Configuration();
      val writer = AvroParquetWriter
       .builder[GenericRecord](file)
       .withSchema(schema)
       .withConf(testConf)
       .build();
      
      // Write a record with an empty array.
      val emptyArray = new util.ArrayList[Integer]();
      val record = new GenericRecordBuilder(schema)
       .set("myarray", emptyArray).build();
      writer.write(record);
      writer.close();
      
      val reader = new AvroParquetReader[GenericRecord](testConf, file);
      val nextRecord = reader.read()

      The project is scala + sbt with dependencies as follow

       

      lazy val parquetVersion = "1.10.0"
      lazy val parquet = "org.apache.parquet" % "parquet" % Version.parquetVersion
      lazy val parquetAvro = "org.apache.parquet" % "parquet-avro" % Version.parquetVersion
      

       

      And this is the stack trace:

       

      Statistics comparator mismatched: SIGNED_INT32_COMPARATOR vs. SIGNED_INT32_COMPARATOR (39 milliseconds)
      [info] org.apache.parquet.column.statistics.StatisticsClassException: Statistics comparator mismatched: SIGNED_INT32_COMPARATOR vs. SIGNED_INT32_COMPARATOR
      [info] at org.apache.parquet.column.statistics.StatisticsClassException.create(StatisticsClassException.java:42)
      [info] at org.apache.parquet.column.statistics.Statistics.mergeStatistics(Statistics.java:327)
      [info] at org.apache.parquet.hadoop.ColumnChunkPageWriteStore$ColumnChunkPageWriter.writePage(ColumnChunkPageWriteStore.java:119)
      [info] at org.apache.parquet.column.impl.ColumnWriterV1.writePage(ColumnWriterV1.java:147)
      [info] at org.apache.parquet.column.impl.ColumnWriterV1.flush(ColumnWriterV1.java:235)
      [info] at org.apache.parquet.column.impl.ColumnWriteStoreV1.flush(ColumnWriteStoreV1.java:122)
      [info] at org.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:169)
      [info] at org.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:109)
      [info] at org.apache.parquet.hadoop.ParquetWriter.close(ParquetWriter.java:301)
      [info] at uk.co.mypackage.di.nrt.HdfsPipelineSpec.$anonfun$new$4(HdfsPipelineSpec.scala:132)
       
      

      As you can see, this is confusing. The error is itself strange because the mismatch doesn't happen at all. Would really appreciate help with this issue.

      Thanks

      Attachments

        Activity

          People

            Unassigned Unassigned
            ajimenez Andres
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: