Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-16700

StructType doesn't accept Python dicts anymore

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.0
    • Fix Version/s: 2.0.1, 2.1.0
    • Component/s: PySpark
    • Labels:

      Description

      Hello,

      I found this issue while testing my codebase with 2.0.0-rc5

      StructType in Spark 1.6.2 accepts the Python <dict> type, which is very handy. 2.0.0-rc5 does not and throws an error.

      I don't know if this was intended but I'd advocate for this behaviour to remain the same. MapType is probably wasteful when your key names never change and switching to Python tuples would be cumbersome.

      Here is a minimal script to reproduce the issue:

      from pyspark import SparkContext
      from pyspark.sql import types as SparkTypes
      from pyspark.sql import SQLContext
      
      
      sc = SparkContext()
      sqlc = SQLContext(sc)
      
      struct_schema = SparkTypes.StructType([
          SparkTypes.StructField("id", SparkTypes.LongType())
      ])
      
      rdd = sc.parallelize([{"id": 0}, {"id": 1}])
      
      df = sqlc.createDataFrame(rdd, struct_schema)
      
      print df.collect()
      
      # 1.6.2 prints [Row(id=0), Row(id=1)]
      
      # 2.0.0-rc5 raises TypeError: StructType can not accept object {'id': 0} in type <type 'dict'>
      
      

      Thanks!

        Attachments

          Activity

            People

            • Assignee:
              davies Davies Liu
              Reporter:
              sylvinus Sylvain Zimmer
            • Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: