Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-17924 Consolidate streaming and batch write path
  3. SPARK-18024

Introduce an internal commit protocol API along with OutputCommitter implementation

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.1.0
    • SQL
    • None

    Description

      This commit protocol API should wrap around Hadoop's output committer. Later we can expand the API to cover streaming commits.

      The existing Hadoop output committer API is insufficient for streaming use cases:

      1. It has no way for tasks to pass information back to the driver.

      2. It relies on the weird Hadoop hashmap to pass information from the driver to the executors, largely because there is no support for language integration and serialization in Hadoop MapReduce. Spark has more natural support for passing information through automatic closure serialization.

      Attachments

        Activity

          People

            rxin Reynold Xin
            rxin Reynold Xin
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: