Details
-
Improvement
-
Status: In Progress
-
Major
-
Resolution: Unresolved
-
3.4.0
-
None
-
None
Description
Currently, DataFrame creation from a list of native Python scalars is unsupported in PySpark, for example,
>>> spark.createDataFrame([1, 2]).collect()
Traceback (most recent call last):
...
TypeError: Can not infer schema for type: <class 'int'>
However, Spark DataFrame Scala API supports that:
scala> Seq(1, 2).toDF().collect()
res6: Array[org.apache.spark.sql.Row] = Array([1], [2])
To maintain API consistency, we propose to support DataFrame creation from a list of scalars.
See more [here](https://docs.google.com/document/d/1Rd20PVbVxNrLfOmDtetVRxkgJQhgAAtJp6XAAZfGQgc/edit?usp=sharing).