Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-30666

Reliable single-stage accumulators

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: In Progress
    • Major
    • Resolution: Unresolved
    • 3.1.0
    • None
    • Spark Core
    • None
    • Patch

    Description

      This proposes a pragmatic improvement to allow for reliable single-stage accumulators. Under the assumption that a given stage / partition / rdd produces identical results, non-deterministic code produces identical accumulator increments on success. Rerunning partitions for any reason should always produce the same increments per partition on success.

      With this pragmatic approach, increments from individual partitions / tasks are only merged into the accumulator on driver side for the first time per partition. This is useful for accumulators registered with countFailedValues == false. Hence, the accumulator aggregates all successful partitions only once.

      The implementations require extra memory that scales with the number of partitions.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              enricomi Enrico Minack
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: