Description
Currently, we use LogicalRDD that always creates an RDD. in Spark SQL, we have some nice optimization with LocalRelation. We should leverage this in createDataFrame in PySpark with Arrow optimization to boost the speed up.
Currently, we use LogicalRDD that always creates an RDD. in Spark SQL, we have some nice optimization with LocalRelation. We should leverage this in createDataFrame in PySpark with Arrow optimization to boost the speed up.