[HIVE-14165] Remove Hive file listing during split computation - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 2.1.0
Fix Version/s: 4.0.0-alpha-1
Component/s: None
Labels:
- pull-request-available

Target Version/s:

3.0.0

Description

The Hive side listing in FetchOperator.java is unnecessary, since Hadoop's FileInputFormat.java will list the files during split computation anyway to determine their size. One way to remove this is to catch the InvalidInputFormat exception thrown by FileInputFormat#getSplits() on the Hive side instead of doing the file listing beforehand.

For S3 select queries on partitioned tables, this results in a 2x speedup.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-14165.02.patch
19/Aug/16 15:53
4 kB
Abdullah Yousufi
HIVE-14165.03.patch
19/Aug/16 23:57
4 kB
Abdullah Yousufi
HIVE-14165.04.patch
21/Dec/16 01:04
4 kB
Sahil Takiar
HIVE-14165.05.patch
21/Dec/16 03:08
4 kB
Sahil Takiar
HIVE-14165.06.patch
21/Dec/16 07:36
4 kB
Sahil Takiar
HIVE-14165.07.patch
22/Mar/17 23:11
4 kB
Sahil Takiar
HIVE-14165.patch
18/Aug/16 03:17
4 kB
Abdullah Yousufi

Issue Links

is depended upon by

HADOOP-13525 Optimize uses of FS operations in the ASF analysis frameworks and libraries

Resolved

is related to

MAPREDUCE-6760 LocatedFileStatusFetcher to use listFiles(recursive)

Open

relates to

HADOOP-13208 S3A listFiles(recursive=true) to do a bulk listObjects instead of walking the pseudo-tree of directories

Resolved

links to

GitHub Pull Request #1866

Review Board

Activity

People

Assignee:: Peter Varga

Reporter:: Abdullah Yousufi

Votes:: 0 Vote for this issue

Watchers:: 15 Start watching this issue

Dates

Created:: 05/Jul/16 22:44

Updated:: 17/Nov/22 08:55

Resolved:: 19/Jan/21 09:52

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

40m