Details
Description
The parquet format specification states that...
INT(8, true), INT(16, true), and INT(32, true) must annotate an int32 primitive type and INT(64, true) must annotate an int64 primitive type. INT(32, true) and INT(64, true) are implied by the int32 and int64 primitive types if no other annotation is present and should be considered optional.
But the code inside of ParquetFilters.scala requires that for int32 and int64 that there be no annotation. If there is an annotation for those columns and they are a part of a predicate push down, the hard coded types will not match and the corresponding filter ends up being None.
This can be a huge performance penalty for a valid parquet file.
I am happy to provide files that show the issue if needed for testing.