Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
Input buffers for NonBlockedDecompressor (and NonBlockedCompressor) are grown one chunk at a time as the class receives successive setInput calls. When decompressing a 64MB block using a 4KB chunk size, this leads to thousands of allocations and deallocations totaling GBs of memory. This can be avoided by doubling the buffer each time rather than adding on a minimal amount of new space.
In a practical scenario I ran into, the time taken to read a 140MB Parquet file was reduced from 35s to <2s.
Attachments
Issue Links
- links to