I think it is a common practice when inserting table data as parquet file, one would always reuse the same object among rows, and if a column is byte of fixed length, the byte would also be reused.
If I use ByteArrayBackedBinary for my byte, the bug occurs: All of the row groups created by a single task would have the same max & min binary value, just as the last row's binary content.
The reason is BinaryStatistic just keep max & min as parquet.io.api.Binary references, since I use ByteArrayBackedBinary for byte, the real content of max & min would always point to the reused byte, therefore the latest row's content.
Does parquet declare somewhere that the user shouldn't reuse byte for Binary type? If it doesn't, I think it's a bug and can be reproduced by Spark SQL's RowWriteSupport
The related Spark JIRA ticket: SPARK-6859