[HIVE-17113] Duplicate bucket files can get written to table by runaway task - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 3.0.0
Component/s: Query Processor
Labels:
None

Description

Saw a table get a duplicate bucket file from a Hive query. It looks like the following happened:

1. Task attempt A_0 starts,but then stops making progress
2. The job was running with speculative execution on, and task attempt A_1 is started
3. Task attempt A_1 finishes execution and saves its output to the temp directory.
5. A task kill is sent to A_0, though this does appear to actually kill A_0
6. The job for the query finishes and Utilities.mvFileToFinalPath() calls Utilities.removeTempOrDuplicateFiles() to check for duplicate bucket files
7. A_0 (still running) finally finishes and saves its file to the temp directory. At this point we now have duplicate bucket files - oops!
8. Utilities.removeTempOrDuplicateFiles() moves the temp directory to the final location, where it is later moved to the partition directory.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-17113.1.patch
17/Jul/17 23:50
2 kB
Jason Dere
HIVE-17113.2.patch
26/Jul/17 01:35
8 kB
Jason Dere
HIVE-17113.3.patch
26/Jul/17 23:57
9 kB
Jason Dere

Issue Links

relates to

HIVE-17813 hive.exec.move.files.from.source.dir does not work with partitioned tables

Closed

HIVE-17963 Fix for HIVE-17113 can be improved for non-blobstore filesystems

Closed

Activity

People

Assignee:: Jason Dere

Reporter:: Jason Dere

Votes:: 1 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 17/Jul/17 22:06

Updated:: 22/May/18 23:58

Resolved:: 31/Jul/17 23:23