[HIVE-15121] Last MR job in Hive should be able to write to a different scratch directory - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Reopened
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: 3.2.0
Component/s: Hive
Labels:
None

Description

Hive should be able to configure all intermediate MR jobs to write to HDFS, but the final MR job to write to S3.

This will be useful for implementing parallel renames on S3. The idea is that for a multi-job query, all intermediate MR jobs write to HDFS, and then the final job writes to S3. Writing to HDFS should be faster than writing to S3, so it makes more sense to write intermediate data to HDFS.

The advantage is that any copying of data that needs to be done from the scratch directory to the final table directory can be done server-side, within the blobstore. The MoveTask simply renames data from the scratch directory to the final table location, which should translate to a server-side COPY request. This way HiveServer2 doesn't have to actually copy any data, it just tells the blobstore to do all the work.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-15121.WIP.patch
04/Nov/16 01:20
10 kB
Sahil Takiar
HIVE-15121.WIP.2.patch
04/Nov/16 20:19
10 kB
Sahil Takiar
HIVE-15121.WIP.1.patch
04/Nov/16 01:47
10 kB
Sahil Takiar
HIVE-15121.patch
04/Nov/16 23:29
7 kB
Sahil Takiar
HIVE-15121.3.patch
22/Nov/16 20:08
29 kB
Sahil Takiar
HIVE-15121.2.patch
21/Nov/16 18:48
29 kB
Sahil Takiar
HIVE-15121.1.patch
08/Nov/16 00:42
21 kB
Sahil Takiar

Issue Links

relates to

SPARK-21514 Hive has updated with new support for S3 and InsertIntoHiveTable.scala should update also

Resolved

links to

Activity

People

Assignee:: Sahil Takiar

Reporter:: Sahil Takiar

Votes:: 0 Vote for this issue

Watchers:: 11 Start watching this issue

Dates

Created:: 03/Nov/16 21:21

Updated:: 02/Aug/22 07:55