[SPARK-10848] Applied JSON Schema Works for json RDD but not when loading json file - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Minor
Resolution: Not A Problem
Affects Version/s: 1.5.0
Fix Version/s: None
Component/s: SQL
Labels:
None

Description

Using a defined schema to load a json rdd works as expected. Loading the json records from a file does not apply the supplied schema. Mainly the nullable field isn't applied correctly. Loading from a file uses nullable=true on all fields regardless of applied schema.

Code to reproduce:

import  org.apache.spark.sql.types._

val jsonRdd = sc.parallelize(List(
  """{"OrderID": 1, "CustomerID":452 , "OrderDate": "2015-05-16", "ProductCode": "WQT648", "Qty": 5}""",
  """{"OrderID": 2, "CustomerID":16  , "OrderDate": "2015-07-11", "ProductCode": "LG4-Z5", "Qty": 10, "Discount":0.25, "expressDelivery":true}"""))

val mySchema = StructType(Array(
  StructField(name="OrderID"   , dataType=LongType, nullable=false),
  StructField("CustomerID", IntegerType, false),
  StructField("OrderDate", DateType, false),
  StructField("ProductCode", StringType, false),
  StructField("Qty", IntegerType, false),
  StructField("Discount", FloatType, true),
  StructField("expressDelivery", BooleanType, true)
))

val myDF = sqlContext.read.schema(mySchema).json(jsonRdd)
val schema1 = myDF.printSchema


val dfDFfromFile = sqlContext.read.schema(mySchema).json("Orders.json")
val schema2 = dfDFfromFile.printSchema

Orders.json

{"OrderID": 1, "CustomerID":452 , "OrderDate": "2015-05-16", "ProductCode": "WQT648", "Qty": 5}
{"OrderID": 2, "CustomerID":16  , "OrderDate": "2015-07-11", "ProductCode": "LG4-Z5", "Qty": 10, "Discount":0.25, "expressDelivery":true}

The behavior should be consistent.

Attachments

Issue Links

relates to

SPARK-23173 from_json can produce nulls for fields which are marked as non-nullable

Resolved

SPARK-25545 CSV loading with DROPMALFORMED mode doesn't correctly drop rows that do not confirm to non-nullable schema fields

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Miklos Christine

Votes:: 2 Vote for this issue

Watchers:: 13 Start watching this issue

Dates

Created:: 27/Sep/15 20:53

Updated:: 14/Oct/19 18:23

Resolved:: 10/Oct/16 05:27