• Type: Improvement
    • Status: Resolved
    • Priority: Trivial
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.1.0
    • Component/s: PySpark
    • Labels:


      In general we should try and keep up to date with Py4J's new releases. The changes in this one are small ( ) and shouldn't impact Spark in any significant way so I'm going to tag this as a starter issue for someone looking to get a deeper understanding of how PySpark works.

      Upgrading Py4J can be a bit tricky compared to updating other packages in general the steps are:
      1) Upgrade the Py4J version on the Java side
      2) Update the py4j src zip file we bundle with Spark
      3) Make sure everything still works (especially the streaming tests because we do weird things to make streaming work and its the most likely place to break during a Py4J upgrade).

      You can see how these bits have been done in past releases by looking in the git log for the last time we changed the Py4J version numbers. Sometimes even for "compatible" releases like this one we may need to make some small code changes in side of PySpark because we hook into Py4Js internals, but I don't think this should be the case here.




            • Assignee:
              as2 Jagadeesan A S
              holdenk holdenk
            • Votes:
              0 Vote for this issue
              2 Start watching this issue


              • Created: