Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-27712

createDataFrame() reorders row

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 2.4.0
    • Fix Version/s: None
    • Component/s: PySpark
    • Labels:
    • Environment:

      emr-5.20.0

      PySpark 2.4.0

      Python 2.7.15

      Description

      Executing  the following:

      my_schema = pyspark.sql.types.StructType([
          pyspark.sql.types.StructField("B", pyspark.sql.types.StringType(), True),
          pyspark.sql.types.StructField("A", pyspark.sql.types.StringType(), True)
      ])
      
      spark.createDataFrame(spark.sparkContext.parallelize([pyspark.sql.Row(A="1", B="2")]), my_schema).collect()
      

      should produce this:

      [Row(A="1", B="2")]
      

      or this:

      [Row(B='2', A='1')]
      

      but produces this instead:

      [Row(B=u'1', A=u'2')]
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                tludwinski Tim Ludwinski
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: