Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-13802

Fields order in Row(**kwargs) is not consistent with Schema.toInternal method

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 1.6.0
    • None
    • PySpark
    • None

    Description

      When using Row constructor from kwargs, fields in the tuple underneath are sorted by name. When Schema is reading the row, it is not using the fields in this order.

      from pyspark.sql import Row
      from pyspark.sql.types import *
      
      schema = StructType([
          StructField("id", StringType()),
          StructField("first_name", StringType())])
      row = Row(id="39", first_name="Szymon")
      schema.toInternal(row)
      Out[5]: ('Szymon', '39')
      
      df = sqlContext.createDataFrame([row], schema)
      df.show(1)
      
      +------+----------+
      |    id|first_name|
      +------+----------+
      |Szymon|        39|
      +------+----------+
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              szymonm Szymon Matejczyk
              Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: