Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-12537

Add option to accept quoting of all character backslash quoting mechanism

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.5.2
    • 2.0.0
    • SQL
    • None

    Description

      We can provides the option to choose JSON parser can be enabled to accept quoting of all character or not.

      For example, if JSON file that includes not listed by JSON backslash quoting specification, it returns corrupt_record

      JSON File
      {"name": "Cazen Lee", "price": "$10"}
      {"name": "John Doe", "price": "\$20"}
      {"name": "Tracy", "price": "$10"}
      

      corrupt_record(returns null)

      scala> df.show
      +--------------------+---------+-----+
      |     _corrupt_record|     name|price|
      +--------------------+---------+-----+
      |                null|Cazen Lee|  $10|
      |{"name": "John Do...|     null| null|
      |                null|    Tracy|  $10|
      +--------------------+---------+-----+
      

      And after apply this patch, we can enable allowBackslashEscapingAnyCharacter option like below

      scala> val df = sqlContext.read.option("allowBackslashEscapingAnyCharacter", "true").json("/user/Cazen/test/test2.txt")
      df: org.apache.spark.sql.DataFrame = [name: string, price: string]
      
      scala> df.show
      +---------+-----+
      |     name|price|
      +---------+-----+
      |Cazen Lee|  $10|
      | John Doe|  $20|
      |    Tracy|  $10|
      +---------+-----+
      

      This issue similar to HIVE-11825, HIVE-12717.

      Attachments

        Activity

          People

            cazen Cazen Lee
            cazen Cazen Lee
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: