Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-10714

Refactor PythonRDD to decouple iterator computation from PythonRDD

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.5.1, 1.6.0
    • PySpark, Spark Core
    • None

    Description

      The idea is that most of the logic of calling Python actually has nothing to do with RDD (it is really just communicating with a socket – there is nothing distributed about it), and it is only currently depending on RDD because it was written this way.

      If we extract that functionality out, we can apply it to area of the code that doesn't depend on RDDs, and also make it easier to test.

      Attachments

        Issue Links

          Activity

            People

              rxin Reynold Xin
              rxin Reynold Xin
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: