Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-1356

Error when closing writer - Statistics comparator mismatched

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Not A Bug
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: parquet-avro
    • Labels:
      None
    • Environment:

      Mac OS Sierra 10.12.6

      IntelliJ 2018.1.6

      sbt 0.1

      scala-sdk-2.12.4

      java jdk1.8.0_144

      Description

      Hi all

      After getting some error in my custom implementation, I was trying to run a test case copied from here and I surprisingly got the same.

      val schema = new Schema.Parser().parse("{\n \"type\": \"record\",\n \"name\": \"myrecord\",\n \"fields\": [ {\n \"name\": \"myarray\",\n \"type\": {\n \"type\": \"array\",\n \"items\": \"int\"\n }\n } ]\n}")
      val tmp = File.createTempFile(getClass().getSimpleName(), ".tmp");
      tmp.deleteOnExit();
      tmp.delete();
      val file = new Path(tmp.getPath());
      val testConf = new Configuration();
      val writer = AvroParquetWriter
       .builder[GenericRecord](file)
       .withSchema(schema)
       .withConf(testConf)
       .build();
      
      // Write a record with an empty array.
      val emptyArray = new util.ArrayList[Integer]();
      val record = new GenericRecordBuilder(schema)
       .set("myarray", emptyArray).build();
      writer.write(record);
      writer.close();
      
      val reader = new AvroParquetReader[GenericRecord](testConf, file);
      val nextRecord = reader.read()

      The project is scala + sbt with dependencies as follow

       

      lazy val parquetVersion = "1.10.0"
      lazy val parquet = "org.apache.parquet" % "parquet" % Version.parquetVersion
      lazy val parquetAvro = "org.apache.parquet" % "parquet-avro" % Version.parquetVersion
      

       

      And this is the stack trace:

       

      Statistics comparator mismatched: SIGNED_INT32_COMPARATOR vs. SIGNED_INT32_COMPARATOR (39 milliseconds)
      [info] org.apache.parquet.column.statistics.StatisticsClassException: Statistics comparator mismatched: SIGNED_INT32_COMPARATOR vs. SIGNED_INT32_COMPARATOR
      [info] at org.apache.parquet.column.statistics.StatisticsClassException.create(StatisticsClassException.java:42)
      [info] at org.apache.parquet.column.statistics.Statistics.mergeStatistics(Statistics.java:327)
      [info] at org.apache.parquet.hadoop.ColumnChunkPageWriteStore$ColumnChunkPageWriter.writePage(ColumnChunkPageWriteStore.java:119)
      [info] at org.apache.parquet.column.impl.ColumnWriterV1.writePage(ColumnWriterV1.java:147)
      [info] at org.apache.parquet.column.impl.ColumnWriterV1.flush(ColumnWriterV1.java:235)
      [info] at org.apache.parquet.column.impl.ColumnWriteStoreV1.flush(ColumnWriteStoreV1.java:122)
      [info] at org.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:169)
      [info] at org.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:109)
      [info] at org.apache.parquet.hadoop.ParquetWriter.close(ParquetWriter.java:301)
      [info] at uk.co.mypackage.di.nrt.HdfsPipelineSpec.$anonfun$new$4(HdfsPipelineSpec.scala:132)
       
      

      As you can see, this is confusing. The error is itself strange because the mismatch doesn't happen at all. Would really appreciate help with this issue.

      Thanks

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              ajimenez Andres
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: