Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-12760

inaccurate description for difference between local vs cluster mode in closure handling

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • 1.6.1, 2.0.0
    • Documentation
    • None

    Description

      In the spark documentation there's an example for illustrating how `local` and `cluster` mode can differ http://spark.apache.org/docs/latest/programming-guide.html#example

      " In local mode with a single JVM, the above code will sum the values within the RDD and store it in counter. This is because both the RDD and the variable counter are in the same memory space on the driver node."

      However the above doesn't seem to be true. Even in `local` mode it seems like the counter value should still be 0, because the variable will be summed up in the executor memory space, but the final value in the driver memory space is still 0. I tested this snippet and verified that in `local` mode the value is indeed still 0.

      Is the doc wrong or perhaps I'm missing something the doc is trying to say?

      Attachments

        Activity

          People

            mortada Mortada Mehyar
            mortada Mortada Mehyar
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: