[PARQUET-77] Improvements in ByteBuffer read path - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.9.0
Component/s: parquet-mr
Labels:
None

Description

For Apache Drill, we are looking to pass in a buffer that we have already allocated (in this case from Direct memory), wrapped in a ByteBuffer.
The current effort to allow a ByteBuffer read path is great except that the interface allocates the memory and there is no way for an application to pass in memory that has been allocated or (even better) to provide an allocator.
Additionally, it would be great to be able to use the same approach while decompressing.
As a starting point here is a patch on top of the ByteBuffer read effort that adds a function in CompatibilityUtils and also adds a ByteBuffer path for the Snappy Decompressor. The latter requires Hadoop 2.3 though, so some discussion on this would be called for.

Please let me have any feedback and I can make changes/additions.

Attachments

Issue Links

contains

PARQUET-251 Binary column statistics error when reuse byte[] among rows

Resolved

is depended upon by

DRILL-1410 Move off of Parquet fork

Resolved

relates to

PARQUET-1006 ColumnChunkPageWriter uses only heap memory.

Open

links to

GitHub Pull Request #49

Activity

People

Assignee:: Jason Altekruse

Reporter:: Parth Chandra

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 22/Aug/14 23:06

Updated:: 23/Jun/24 03:26

Resolved:: 04/Nov/15 17:58