Description
We can provides the option to choose JSON parser can be enabled to accept quoting of all character or not.
For example, if JSON file that includes not listed by JSON backslash quoting specification, it returns corrupt_record
JSON File
{"name": "Cazen Lee", "price": "$10"} {"name": "John Doe", "price": "\$20"} {"name": "Tracy", "price": "$10"}
corrupt_record(returns null)
scala> df.show +--------------------+---------+-----+ | _corrupt_record| name|price| +--------------------+---------+-----+ | null|Cazen Lee| $10| |{"name": "John Do...| null| null| | null| Tracy| $10| +--------------------+---------+-----+
And after apply this patch, we can enable allowBackslashEscapingAnyCharacter option like below
scala> val df = sqlContext.read.option("allowBackslashEscapingAnyCharacter", "true").json("/user/Cazen/test/test2.txt") df: org.apache.spark.sql.DataFrame = [name: string, price: string] scala> df.show +---------+-----+ | name|price| +---------+-----+ |Cazen Lee| $10| | John Doe| $20| | Tracy| $10| +---------+-----+
This issue similar to HIVE-11825, HIVE-12717.
Attachments
Issue Links
- links to