Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-23603

When the length of the json is in a range,get_json_object will result in missing tail data

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 2.0.0, 2.2.0, 2.3.0
    • None
    • SQL
    • None

    Description

      Jackson(>=2.7.7) fixes the possibility of missing tail data when the length of the value is in a range

      https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.7.7

      https://github.com/FasterXML/jackson-core/issues/307

      spark-shell:

      val value = "x" * 3000
      val json = s"""{"big": "$value"}"""
      spark.sql("select length(get_json_object(\'"+json+"\','$.big'))" ).collect
      
      res0: Array[org.apache.spark.sql.Row] = Array([2991])
      

      expect result : 3000
      actual result  : 2991

      There are two solutions
      One is
      Bump jackson from 2.6.7&2.6.7.1 to 2.7.7
      The other one is
      Replace writeRaw(char[] text, int offset, int len) with writeRaw(String text)

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            dzcxzl dzcxzl
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: