Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-22216 Improving PySpark/Pandas interoperability
  3. SPARK-22324

Upgrade Arrow to version 0.8.0 and upgrade Netty to 4.1.17

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.3.0
    • Fix Version/s: 2.3.0
    • Component/s: PySpark, SQL
    • Labels:
      None

      Description

      Arrow version 0.8.0 is slated for release in early November, but I'd like to start discussing to help get all the work that's being done synced up.

      Along with upgrading the Arrow Java artifacts, pyarrow on our Jenkins test envs will need to be upgraded as well that will take a fair amount of work and planning.

      One topic I'd like to discuss is if pyarrow should be an installation requirement for pyspark, i.e. when a user pip installs pyspark, it will also install pyarrow. If not, then is there a minimum version that needs to be supported? We currently have 0.4.1 installed on Jenkins.

      There are a number of improvements and cleanups in the current code that can happen depending on what we decide (I'll link them all here later, but off the top of my head):

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                bryanc Bryan Cutler
                Reporter:
                bryanc Bryan Cutler
              • Votes:
                1 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: