[SPARK-733] Add documentation on use of accumulators in lazy transformation - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.2.1, 1.3.0
Component/s: Documentation
Labels:
None

Description

Accumulators updates are side-effects of RDD computations. Unlike RDDs, accumulators do not carry lineage that would allow them to be computed when their values are accessed on the master.

This can lead to confusion when accumulators are used in lazy transformations like `map`:

    val acc = sc.accumulator(0)
    data.map(x => acc += x; f(x))
    // Here, acc is 0 because no actions have cause the `map` to be computed.

As far as I can tell, our documentation only includes examples of using accumulators in `foreach`, for which this problem does not occur.

This pattern of using accumulators in map() occurs in Bagel and other Spark code found in the wild.

It might be nice to document this behavior in the accumulators section of the Spark programming guide.

Attachments

Issue Links

is related to

SPARK-3642 Better document the nuances of shared variables

Resolved

links to

[Github] Pull Request #4022 (ilganeli)

Activity

People

Assignee:: Unassigned

Reporter:: Josh Rosen

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 14/Apr/13 22:59

Updated:: 16/Jan/15 21:33

Resolved:: 16/Jan/15 21:33