[HIVE-9269] LLAP: introduce low-level cache for ORC - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: llap
Component/s: None
Labels:
None

Description

There are two distinct options for caching encoded data in row-columnar format - caching logical chunks (e.g. for ORC stripe x column, or rg x column), or caching physical chunks (e.g. for ORC, compression buffers, entire stripes, ...). For highly selective queries, the former will probably result in better cache utilization and less undesirable priority phenomena. It will also be easier to use for different formats.
However, given that logical chunks are variable-sized, it's harder to implement. Prototype has a form of cache like that, but it has some serious shortcomings in its current form. Additionally, high-level cache will operate above ACID logic in file format and would thus require cache invalidation, which is as we know one of the only hard things in CS.
Low level cache for ORC case, however, is easier to implement due to nearly fixed uncompressed size of compression buffers; these, at 256k default, are also sufficiently granular. While not having the benefit of having ACID delta-s already merged like a high-level cache would have, it will work with ACID out of the box.

This JIRA is to implement low level cache.

Attachments

Activity

People

Assignee:: Sergey Shelukhin

Reporter:: Sergey Shelukhin

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 06/Jan/15 21:13

Updated:: 17/Feb/16 01:32

Resolved:: 17/Jan/15 02:06