Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-44001

Improve parsing of well known wrapper types

    XMLWordPrintableJSON

Details

    Description

      Under `com.google.protobuf`, there are some well known wrapper types for primitives, namely, useful for distinguishing between absence of primitive fields and their default values, as well as for use within `google.protobuf.Any` types. These types are:

      DoubleValue
      FloatValue
      Int64Value
      Uint64Value
      Int32Value
      Uint32Value
      BoolValue
      StringValue
      BytesValue
      

      Currently, when we deserialize these from a serialized protobuf into a spark struct, we expand them as if they were normal messages. Concretely, if we have

      syntax = "proto3";
      
      import "google/protobuf/wrappers.proto"
      
      message WktExample {
        google.protobuf.BoolValue bool_val = 1;
        google.protobuf.Int32Value int32_val = 2;
      }
      

      And a message like

      WktExample(true, 100)
      

      Then the behavior today is to deserialize this as.

      {"bool_val": {"value": true}, "int32_val": {"value": 100}}
      

      This is quite difficult to work with and not in the spirit of the wrapper type, so it would be nice to deserialize as

      {"bool_val": true, "int32_val": 100}
      

      This is also the behavior by other popular deserialization libraries, including java protobuf util Jsonformat and golangs jsonpb.

      So for consistency with other libraries and improved usability, I propose we deserialize well known types in this way.

      Attachments

        Activity

          People

            justaparth Parth Upadhyay
            justaparth Parth Upadhyay
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: