Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-39494

Support `createDataFrame` from a list of scalars when schema is not provided

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: In Progress
    • Major
    • Resolution: Unresolved
    • 3.4.0
    • None
    • PySpark
    • None

    Description

      Currently, DataFrame creation from a list of native Python scalars is unsupported in PySpark, for example,

      >>> spark.createDataFrame([1, 2]).collect()
      Traceback (most recent call last):
      ...
      TypeError: Can not infer schema for type: <class 'int'>

      However, Spark DataFrame Scala API supports that:

      scala> Seq(1, 2).toDF().collect()
      res6: Array[org.apache.spark.sql.Row] = Array([1], [2])

      To maintain API consistency, we propose to support DataFrame creation from a list of scalars. 

      See more [here](https://docs.google.com/document/d/1Rd20PVbVxNrLfOmDtetVRxkgJQhgAAtJp6XAAZfGQgc/edit?usp=sharing).

      Attachments

        Activity

          People

            Unassigned Unassigned
            XinrongM Xinrong Meng
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: