Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
None
-
None
-
None
-
None
Description
I'm looking at some benchmarking code of Apache ORC v.s. Apache Parquet and see that Parquet is quite a bit slower for writes (reads TBD). Based on my investigation, I have noticed a significant amount of time spent in determining min/max for binary types.
One quick improvement is to bypass a "max" value determinization if the value has already been determined to be a "min".
While I'm at it, remove calls to deprecated functions.