[SPARK-8578] Should ignore user defined output committer when appending data - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.4.0
Fix Version/s: 1.4.1, 1.5.0
Component/s: SQL
Labels:
None

Target Version/s:

1.4.1, 1.5.0

Description

When appending data to a file system via Hadoop API, it's safer to ignore user defined output committer classes like DirectParquetOutputCommitter. Because it's relatively hard to handle task failure in this case. For example, DirectParquetOutputCommitter directly writes to the output directory to boost write performance when working with S3. However, there's no general way to determine task output file path of a specific task in Hadoop API, thus we don't know to revert a failed append job. (When doing overwrite, we can just remove the whole output directory.)

Attachments

Issue Links

relates to

SPARK-10063 Remove DirectParquetOutputCommitter

Resolved

links to

[Github] Pull Request #6964 (yhuai)

[Github] Pull Request #6966 (yhuai)

Activity

People

Assignee:: Yin Huai

Reporter:: Cheng Lian

Shepherd:: Cheng Lian

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 23/Jun/15 22:13

Updated:: 25/May/17 10:48

Resolved:: 24/Jun/15 16:51