Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-12760

inaccurate description for difference between local vs cluster mode in closure handling

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.6.1, 2.0.0
    • Component/s: Documentation
    • Labels:
      None

      Description

      In the spark documentation there's an example for illustrating how `local` and `cluster` mode can differ http://spark.apache.org/docs/latest/programming-guide.html#example

      " In local mode with a single JVM, the above code will sum the values within the RDD and store it in counter. This is because both the RDD and the variable counter are in the same memory space on the driver node."

      However the above doesn't seem to be true. Even in `local` mode it seems like the counter value should still be 0, because the variable will be summed up in the executor memory space, but the final value in the driver memory space is still 0. I tested this snippet and verified that in `local` mode the value is indeed still 0.

      Is the doc wrong or perhaps I'm missing something the doc is trying to say?

        Attachments

          Activity

            People

            • Assignee:
              mortada Mortada Mehyar
              Reporter:
              mortada Mortada Mehyar
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: