Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-43051

Allow emitting zero values when deserializing protobuf messages

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.4.0
    • 3.5.0
    • Protobuf
    • None

    Description

      Currently, when deserializing protobufs using from_protobuf, fields that are not explicitly present in the serialized message are deserialized as null in the resulting struct. However this includes singular proto3 scalars set explicitly to their default values, as they will not appear in the serialized protobuf.

      For example, given a message format like

       

      syntax = "proto3";
      message Person {
      string name = 1;
      int64 age = 2;
      optional string middle_name = 3;
      optional int64 salary = 4;
      }
      

      and an example message like

       

      SearchRequest(age = 0, middle_name = "")

      the result from calling from_protobuf on the serialized form of the above message would be

       

      {"name": null, "age": null, "middle_name": "", "salary": null}

       

      It can be useful to deserialize these fields as their defaults, e.g.:

       

      {"name": "", "age": 0, "middle_name": "", "salary": null}

       

      This behavior also exists in other major libraries, e.g.

      I propose extending the spark-protobuf library to support this behavior.

       

      PR: https://github.com/apache/spark/pull/40686

      Attachments

        Activity

          People

            justaparth Parth Upadhyay
            justaparth Parth Upadhyay
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: