Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.4.0
Description
Under `com.google.protobuf`, there are some well known wrapper types for primitives, namely, useful for distinguishing between absence of primitive fields and their default values, as well as for use within `google.protobuf.Any` types. These types are:
DoubleValue FloatValue Int64Value Uint64Value Int32Value Uint32Value BoolValue StringValue BytesValue
Currently, when we deserialize these from a serialized protobuf into a spark struct, we expand them as if they were normal messages. Concretely, if we have
syntax = "proto3"; import "google/protobuf/wrappers.proto" message WktExample { google.protobuf.BoolValue bool_val = 1; google.protobuf.Int32Value int32_val = 2; }
And a message like
WktExample(true, 100)
Then the behavior today is to deserialize this as.
{"bool_val": {"value": true}, "int32_val": {"value": 100}}
This is quite difficult to work with and not in the spirit of the wrapper type, so it would be nice to deserialize as
{"bool_val": true, "int32_val": 100}
This is also the behavior by other popular deserialization libraries, including java protobuf util Jsonformat and golangs jsonpb.
So for consistency with other libraries and improved usability, I propose we deserialize well known types in this way.