Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-10714

Refactor PythonRDD to decouple iterator computation from PythonRDD

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.5.1, 1.6.0
    • Component/s: PySpark, Spark Core
    • Labels:
      None
    • Target Version/s:

      Description

      The idea is that most of the logic of calling Python actually has nothing to do with RDD (it is really just communicating with a socket – there is nothing distributed about it), and it is only currently depending on RDD because it was written this way.

      If we extract that functionality out, we can apply it to area of the code that doesn't depend on RDDs, and also make it easier to test.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                rxin Reynold Xin
                Reporter:
                rxin Reynold Xin
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: