[SPARK-18024] Introduce an internal commit protocol API along with OutputCommitter implementation - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.1.0
Component/s: SQL
Labels:
None

Target Version/s:

2.1.0

Description

This commit protocol API should wrap around Hadoop's output committer. Later we can expand the API to cover streaming commits.

The existing Hadoop output committer API is insufficient for streaming use cases:

1. It has no way for tasks to pass information back to the driver.

2. It relies on the weird Hadoop hashmap to pass information from the driver to the executors, largely because there is no support for language integration and serialization in Hadoop MapReduce. Spark has more natural support for passing information through automatic closure serialization.

Attachments

Issue Links

links to

[Github] Pull Request #15696 (rxin)

[Github] Pull Request #15707 (rxin)

Activity

People

Assignee:: Reynold Xin

Reporter:: Reynold Xin

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 20/Oct/16 07:04

Updated:: 02/Nov/16 23:11

Resolved:: 01/Nov/16 05:23