Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Not A Bug
-
None
-
None
-
None
-
Mac OS Sierra 10.12.6
IntelliJ 2018.1.6
sbt 0.1
scala-sdk-2.12.4
java jdk1.8.0_144
Description
Hi all
After getting some error in my custom implementation, I was trying to run a test case copied from here and I surprisingly got the same.
val schema = new Schema.Parser().parse("{\n \"type\": \"record\",\n \"name\": \"myrecord\",\n \"fields\": [ {\n \"name\": \"myarray\",\n \"type\": {\n \"type\": \"array\",\n \"items\": \"int\"\n }\n } ]\n}") val tmp = File.createTempFile(getClass().getSimpleName(), ".tmp"); tmp.deleteOnExit(); tmp.delete(); val file = new Path(tmp.getPath()); val testConf = new Configuration(); val writer = AvroParquetWriter .builder[GenericRecord](file) .withSchema(schema) .withConf(testConf) .build(); // Write a record with an empty array. val emptyArray = new util.ArrayList[Integer](); val record = new GenericRecordBuilder(schema) .set("myarray", emptyArray).build(); writer.write(record); writer.close(); val reader = new AvroParquetReader[GenericRecord](testConf, file); val nextRecord = reader.read()
The project is scala + sbt with dependencies as follow
lazy val parquetVersion = "1.10.0" lazy val parquet = "org.apache.parquet" % "parquet" % Version.parquetVersion lazy val parquetAvro = "org.apache.parquet" % "parquet-avro" % Version.parquetVersion
And this is the stack trace:
Statistics comparator mismatched: SIGNED_INT32_COMPARATOR vs. SIGNED_INT32_COMPARATOR (39 milliseconds)
[info] org.apache.parquet.column.statistics.StatisticsClassException: Statistics comparator mismatched: SIGNED_INT32_COMPARATOR vs. SIGNED_INT32_COMPARATOR
[info] at org.apache.parquet.column.statistics.StatisticsClassException.create(StatisticsClassException.java:42)
[info] at org.apache.parquet.column.statistics.Statistics.mergeStatistics(Statistics.java:327)
[info] at org.apache.parquet.hadoop.ColumnChunkPageWriteStore$ColumnChunkPageWriter.writePage(ColumnChunkPageWriteStore.java:119)
[info] at org.apache.parquet.column.impl.ColumnWriterV1.writePage(ColumnWriterV1.java:147)
[info] at org.apache.parquet.column.impl.ColumnWriterV1.flush(ColumnWriterV1.java:235)
[info] at org.apache.parquet.column.impl.ColumnWriteStoreV1.flush(ColumnWriteStoreV1.java:122)
[info] at org.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:169)
[info] at org.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:109)
[info] at org.apache.parquet.hadoop.ParquetWriter.close(ParquetWriter.java:301)
[info] at uk.co.mypackage.di.nrt.HdfsPipelineSpec.$anonfun$new$4(HdfsPipelineSpec.scala:132)
As you can see, this is confusing. The error is itself strange because the mismatch doesn't happen at all. Would really appreciate help with this issue.
Thanks