Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-39301

Levearge LocalRelation in createDataFrame with Arrow optimization

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.4.0
    • 3.4.0
    • PySpark
    • None

    Description

      Currently, we use LogicalRDD that always creates an RDD. in Spark SQL, we have some nice optimization with LocalRelation. We should leverage this in createDataFrame in PySpark with Arrow optimization to boost the speed up.

      Attachments

        Activity

          People

            gurwls223 Hyukjin Kwon
            gurwls223 Hyukjin Kwon
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: