Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.12.2
-
None
-
None
Description
This PR is all performance optimization. In benchmarking with Trino, we find query performance to improve from 5% to 15%, depending on the query, and that includes all the I/O time from S3.
The main modification is to merge all of LittleEndianDataInputStream functionality into ByteBufferInputStream, which yields the following benefits:
- Elimination of extra layers of abstraction and method call overhead
- Enable the use of intrinsics for readInt, readLong, etc.
- Availability of faster access methods like readFully and skipFully, without the need for helper functions
- Reduces some object creation in the performance critical path
This also includes and enables performance optimizations to:
- ByteBitPackingValuesReader
- PlainValuesReader
- RunLengthBitPackingHybridDecoder
Context:
I've been working on improving Parquet reading performance in Trino, mostly by profiling while running performance benchmarks and TPCDS queries. This PR is a subset of the changes I made that have more than doubled the performance of a lot of TPCDS queries (wall clock time, including the S3 access time). If you are kind enough to accept these changes, I have more I would like to contribute.