Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-10848

Applied JSON Schema Works for json RDD but not when loading json file

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Not A Problem
    • 1.5.0
    • None
    • SQL
    • None

    Description

      Using a defined schema to load a json rdd works as expected. Loading the json records from a file does not apply the supplied schema. Mainly the nullable field isn't applied correctly. Loading from a file uses nullable=true on all fields regardless of applied schema.

      Code to reproduce:

      import  org.apache.spark.sql.types._
      
      val jsonRdd = sc.parallelize(List(
        """{"OrderID": 1, "CustomerID":452 , "OrderDate": "2015-05-16", "ProductCode": "WQT648", "Qty": 5}""",
        """{"OrderID": 2, "CustomerID":16  , "OrderDate": "2015-07-11", "ProductCode": "LG4-Z5", "Qty": 10, "Discount":0.25, "expressDelivery":true}"""))
      
      val mySchema = StructType(Array(
        StructField(name="OrderID"   , dataType=LongType, nullable=false),
        StructField("CustomerID", IntegerType, false),
        StructField("OrderDate", DateType, false),
        StructField("ProductCode", StringType, false),
        StructField("Qty", IntegerType, false),
        StructField("Discount", FloatType, true),
        StructField("expressDelivery", BooleanType, true)
      ))
      
      val myDF = sqlContext.read.schema(mySchema).json(jsonRdd)
      val schema1 = myDF.printSchema
      
      
      val dfDFfromFile = sqlContext.read.schema(mySchema).json("Orders.json")
      val schema2 = dfDFfromFile.printSchema
      

      Orders.json

      {"OrderID": 1, "CustomerID":452 , "OrderDate": "2015-05-16", "ProductCode": "WQT648", "Qty": 5}
      {"OrderID": 2, "CustomerID":16  , "OrderDate": "2015-07-11", "ProductCode": "LG4-Z5", "Qty": 10, "Discount":0.25, "expressDelivery":true}
      

      The behavior should be consistent.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              mwc Miklos Christine
              Votes:
              2 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: