[HIVE-13985] ORC improvements for reducing the file system calls in task side - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 1.3.0, 2.2.0
Fix Version/s: 1.3.0, 2.1.1, 2.2.0
Component/s: ORC
Labels:
None

Target Version/s:

1.3.0, 2.2.0

Description

~~HIVE-13840~~ fixed some issues with addition file system invocations during split generation. Similarly, this jira will fix issues with additional file system invocations on the task side. To avoid reading footers on the task side, users can set hive.orc.splits.include.file.footer to true which will serialize the orc footers on the splits. But this has issues with serializing unwanted information like column statistics and other metadata which are not really required for reading orc split on the task side. We can reduce the payload on the orc splits by serializing only the minimum required information (stripe information, types, compression details). This will decrease the payload on the orc splits and can potentially avoid OOMs in application master (AM) during split generation. This jira also address other issues concerning the AM cache. The local cache used by AM is soft reference cache. This can introduce unpredictability across multiple runs of the same query. We can cache the serialized footer in the local cache and also use strong reference cache which should avoid memory pressure and will have better predictability.

One other improvement that we can do is when hive.orc.splits.include.file.footer is set to false, on the task side we make one additional file system call to know the size of the file. If we can serialize the file length in the orc split this can be avoided.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-13985-branch-2.1.patch
14/Jun/16 18:44
147 kB
Prasanth Jayachandran
HIVE-13985-branch-1.patch
15/Jun/16 09:11
123 kB
Prasanth Jayachandran
HIVE-13985-branch-1.patch
16/Jun/16 08:28
123 kB
Prasanth Jayachandran
HIVE-13985-branch-1.patch
16/Jun/16 19:59
126 kB
Prasanth Jayachandran
HIVE-13985-branch-1.patch
16/Jun/16 21:14
126 kB
Prasanth Jayachandran
HIVE-13985.6.patch
20/Jun/16 20:34
167 kB
Prasanth Jayachandran
HIVE-13985.5.patch
18/Jun/16 02:25
167 kB
Prasanth Jayachandran
HIVE-13985.4.patch
17/Jun/16 09:38
167 kB
Prasanth Jayachandran
HIVE-13985.3.patch
16/Jun/16 23:25
166 kB
Prasanth Jayachandran
HIVE-13985.2.patch
14/Jun/16 17:05
147 kB
Prasanth Jayachandran
HIVE-13985.1.patch
14/Jun/16 16:46
150 kB
Prasanth Jayachandran

Issue Links

links to

RB - For master patch

Activity

People

Assignee:: Prasanth Jayachandran

Reporter:: Prasanth Jayachandran

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 09/Jun/16 21:00

Updated:: 08/Dec/16 14:38

Resolved:: 21/Jun/16 01:10