Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-26958

JsonSerDe data corruption when scalar type is a json object



    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • File Formats
    • None



      JsonSerDe uses the Jackson JsonParser.getText for decoding scalar values from json strings.  The problem is this method in Jackson converts any token to text including START_OBJECT '{}{'.  This means when a scalar field is actually a json object, JsonSerDe will process the open curly bracket for BOOLEAN, DECIMAL, CHAR, VARCHAR, and VARBINARY. Then it continues processing field inside of the json object as if they are part of the outer json object. When the closing curly bracket is encountered it pops a level, which can end parsing early. This bug will result in corrupted data for the following JSON:


      { "boolean_field" : {}, "other_field" : 99 } 
        => [boolean_field=false, other_field=null]
      { "boolean_field" : { "other_field" : 42 }, "other_field" : 99 } => (false, null) 
       => [boolean_field=false, other_field=42]


      BTW, when a json array is passed instead of an object, you get an error because the array does not contain fields which the code checks for.

      I think the behavior should result in an error like you get when a json array is field value for a scalar.  If so the fix is to make sure the value token a scalar for non-complex types in extractCurrentField, so something like this:

      if (!hcatFieldSchema.isComplex() && !valueToken.isScalarValue()) {
          throw new IOException(type + " value must be a scalar json value");






            Unassigned Unassigned
            dain Dain Sundstrom
            0 Vote for this issue
            1 Start watching this issue