Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
The new filter API seems to be much slower (or perhaps I'm using it wrong :)
Code using an UnboundRecordFilter:
ColumnRecordFilter.column(column, ColumnPredicates.applyFunctionToBinary( input -> Binary.fromString(value).equals(input)));
vs. code using FilterPredicate:
eq(binaryColumn(column), Binary.fromString(value));
The latter performs twice as slow on the same Parquet file (built using 1.6.0rc2).
Note: the reader is constructed using
ParquetReader.builder(new ProtoReadSupport().withFilter(filter).build()
The new filter API based approach seems to create a whole lot more garbage (perhaps due to reconstructing all the rows?).
Attachments
Issue Links
- is related to
-
PARQUET-182 FilteredRecordReader skips rows it shouldn't for schema with optional columns
-
- Open
-