Thrift
  1. Thrift
  2. THRIFT-1528

Inconsistency in optional fields between Java/C# and python

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.8
    • Fix Version/s: 1.0
    • Component/s: Python - Compiler
    • Labels:
      None

      Description

      If a struct contains optional fields with default values the generated python code serialize differently than Java or C# code.

      In Java or C# optional fields are only serialized if a field was set by the client. If not the field is omited during serialization. This is possible because C#/Java maintains for each field a 'isset'-boolean which records if a field was set or not.

      However the generated python code does not have such a 'isset'-structure. It writes every field which is not equal None. As the constructor initialize the optional fields with their default value, these fields are written whether they are set or not.

        Activity

        Hide
        Nate Rosenblum added a comment -

        A similar inconsistency applies to C++ structures, where all optional fields that have default values are serialized (the isset predicate is initialized to true for such fields). These inconsistencies break applications that depend on consistency of serialized messages regardless of platform (e.g., computing HMAC on messages).

        Show
        Nate Rosenblum added a comment - A similar inconsistency applies to C++ structures, where all optional fields that have default values are serialized (the isset predicate is initialized to true for such fields). These inconsistencies break applications that depend on consistency of serialized messages regardless of platform (e.g., computing HMAC on messages).
        Hide
        Will Pierce added a comment -

        i think the java and c# implementations should follow the c++ and python implementations, otherwise it doesn't make sense for a field to have a default value at all, if it isn't included when serializing, does it?

        Show
        Will Pierce added a comment - i think the java and c# implementations should follow the c++ and python implementations, otherwise it doesn't make sense for a field to have a default value at all, if it isn't included when serializing, does it?
        Hide
        Diwaker Gupta added a comment -

        As with most things, there's a tradeoff here. For protocols where most fields are marked as optional and have default values, and you have tight control (and versioning) of the end points, it will be a lot more efficient to omit serializing such fields and have the end points populate them with defaults during deserialization.

        Providing binary compatibility however, is simpler if we serialize default values for optional fields. For instance, what happens if the client does set a field but the value is the same as the default?

        At the end of the day it's more important for Thrift language bindings to be mutually consistent. Will, do you think this is worth discussing on thrift-dev?

        Show
        Diwaker Gupta added a comment - As with most things, there's a tradeoff here. For protocols where most fields are marked as optional and have default values, and you have tight control (and versioning) of the end points, it will be a lot more efficient to omit serializing such fields and have the end points populate them with defaults during deserialization. Providing binary compatibility however, is simpler if we serialize default values for optional fields. For instance, what happens if the client does set a field but the value is the same as the default? At the end of the day it's more important for Thrift language bindings to be mutually consistent. Will, do you think this is worth discussing on thrift-dev?
        Hide
        Stefan Gmeiner added a comment -

        I found the following statement in Thrift Missing Guide:

        Changing a default value is generally OK, as long as you remember that default values are never sent over the wire. Thus, if a program receives a message in which a particular field isn’t set, the program will see the default value as it was defined in that program’s version of the protocol. It will NOT see the default value that was defined in the sender’s code.

        Example:

        If you update a server with a changed default value you will get different behavior depending if you use Java or Python as old client. If the old Java client talks to the new server (without using the default field) the server will see the new default value whereas the the python client will sent the old default value and thus the server will get the old default value.

        I'm not sure which one should be preferred but I think it should not depending on the client language used. I don't see an reason why some language should behave different than the static ones (to ensure binary compatibility the thrift compiler could provide a language option to control the generation of the code).

        Show
        Stefan Gmeiner added a comment - I found the following statement in Thrift Missing Guide : Changing a default value is generally OK, as long as you remember that default values are never sent over the wire. Thus, if a program receives a message in which a particular field isn’t set, the program will see the default value as it was defined in that program’s version of the protocol. It will NOT see the default value that was defined in the sender’s code. Example: If you update a server with a changed default value you will get different behavior depending if you use Java or Python as old client. If the old Java client talks to the new server (without using the default field) the server will see the new default value whereas the the python client will sent the old default value and thus the server will get the old default value. I'm not sure which one should be preferred but I think it should not depending on the client language used. I don't see an reason why some language should behave different than the static ones (to ensure binary compatibility the thrift compiler could provide a language option to control the generation of the code).
        Hide
        Bryan Duxbury added a comment -

        From my perspective, its insane to have a field have a value and not to serialize it. In fact, the whole "isset" thing in Java is just a hack around the fact that you can't have null primitives. I would vote for standardizing on default values being considered set at object creation.

        (An interesting detail is that in Java, the isset field only covers fields that's can't be natively set to null, so a default value for a struct, string, binary, or collection type actually would be serialized, while a bool, byte, short, int, long, or double would not be.)

        Show
        Bryan Duxbury added a comment - From my perspective, its insane to have a field have a value and not to serialize it. In fact, the whole "isset" thing in Java is just a hack around the fact that you can't have null primitives. I would vote for standardizing on default values being considered set at object creation. (An interesting detail is that in Java, the isset field only covers fields that's can't be natively set to null, so a default value for a struct, string, binary, or collection type actually would be serialized, while a bool, byte, short, int, long, or double would not be.)
        Hide
        Jens Geyer added a comment - - edited

        From my perspective, its insane to have a field have a value and not to serialize it. [...] I would vote for standardizing on default values being considered set at object creation.

        +1

        Show
        Jens Geyer added a comment - - edited From my perspective, its insane to have a field have a value and not to serialize it. [...] I would vote for standardizing on default values being considered set at object creation. +1
        Hide
        Jake Farrell added a comment -

        using the following example of a update service you can not tell if an optional primitive parameter was set or its using the default value without isset.

        service User(id, optional string name, optional bool isAwesome=false)
        first call User(id=1, isAwesome=true)
        second call User(id=1, name="foo")

        If the second call had to serialize the default optional value no matter what it would cause "foo" to no longer be awesome even thought it was originally set in the first call and was an optional parameter for the second call. I would prefer to go with nullable types across the board as well, but since this currently is not an option for primitives I think we need to keep isset for those types for now.

        Show
        Jake Farrell added a comment - using the following example of a update service you can not tell if an optional primitive parameter was set or its using the default value without isset. service User(id, optional string name, optional bool isAwesome=false) first call User(id=1, isAwesome=true) second call User(id=1, name="foo") If the second call had to serialize the default optional value no matter what it would cause "foo" to no longer be awesome even thought it was originally set in the first call and was an optional parameter for the second call. I would prefer to go with nullable types across the board as well, but since this currently is not an option for primitives I think we need to keep isset for those types for now.
        Hide
        Diwaker Gupta added a comment -

        From my perspective, its insane to have a field have a value and not to serialize it. In fact, the whole "isset" thing in Java is just a hack around the fact that you can't have null primitives. I would vote for standardizing on default values being considered set at object creation.

        1 for Bryan's suggestion. In fact, my first patch for THRIFT-1394 did precisely this for C. I'm happy to provide patches for at least C+ and Java so we always serialize optional fields with defaults specified, even if they were never explicitly set.

        I would prefer to go with nullable types across the board as well, but since this currently is not an option for primitives I think we need to keep isset for those types for now.

        Jake: there are nullable boxed types for most Java primitives (Int, Long, Boolean etc). If the Thrift compiler uses those, then we could have nullable types across the board right?

        Show
        Diwaker Gupta added a comment - From my perspective, its insane to have a field have a value and not to serialize it. In fact, the whole "isset" thing in Java is just a hack around the fact that you can't have null primitives. I would vote for standardizing on default values being considered set at object creation. 1 for Bryan's suggestion. In fact, my first patch for THRIFT-1394 did precisely this for C . I'm happy to provide patches for at least C + and Java so we always serialize optional fields with defaults specified, even if they were never explicitly set. I would prefer to go with nullable types across the board as well, but since this currently is not an option for primitives I think we need to keep isset for those types for now. Jake: there are nullable boxed types for most Java primitives (Int, Long, Boolean etc). If the Thrift compiler uses those, then we could have nullable types across the board right?
        Hide
        Stefan Gmeiner added a comment -

        +1 for Bryan's

        This would also fix the problem, that in Java fields could be set using public field access which will not set its corresponding isset-field.

        Show
        Stefan Gmeiner added a comment - +1 for Bryan's This would also fix the problem, that in Java fields could be set using public field access which will not set its corresponding isset-field.
        Hide
        Jake Farrell added a comment -

        Diwaker: Autoboxing adds unnecessary overhead and should not be used in performance intensive code

        Show
        Jake Farrell added a comment - Diwaker: Autoboxing adds unnecessary overhead and should not be used in performance intensive code
        Hide
        Jens Geyer added a comment -

        service User(id, optional string name, optional bool isAwesome=false)
        first call User(id=1, isAwesome=true)
        second call User(id=1, name="foo")

        If the second call had to serialize the default optional value no matter what it would cause "foo" to no longer be awesome even thought it was originally set in the first call and was an optional parameter for the second call.

        Mmmh.
        In essence this would mean, that there there will always be an (explicit or implicit) input value.

        If I don't overlook anything, we have two possible cases here:

        1. Either isAwesome is not really optional, since it always has an (explicit or implicit) value.
        2. Or, if we value the optional higher, omitted values would never be set, the default is obsolete.

        This raises the big question, what the intention of the entire construct "optionally set with default" would be? What is the expected behaviour, exactly? Are there any use cases for this combination?

        Show
        Jens Geyer added a comment - service User(id, optional string name, optional bool isAwesome=false) first call User(id=1, isAwesome=true) second call User(id=1, name="foo") If the second call had to serialize the default optional value no matter what it would cause "foo" to no longer be awesome even thought it was originally set in the first call and was an optional parameter for the second call. Mmmh. In essence this would mean, that there there will always be an (explicit or implicit) input value. If I don't overlook anything, we have two possible cases here: Either isAwesome is not really optional, since it always has an (explicit or implicit) value. Or, if we value the optional higher, omitted values would never be set, the default is obsolete. This raises the big question, what the intention of the entire construct "optionally set with default" would be? What is the expected behaviour, exactly? Are there any use cases for this combination?
        Hide
        Nate Rosenblum added a comment -

        I'm sympathetic to the position that "optionally set with default" is a weird combination of attributes for a message field.

        I think the most compelling argument in favor is protocol flexibility: making the field default means you don't have to program to it directly & can change it globally at some later date (e.g. protocol version fields), while keeping it optional means that you can delete the field completely in the future if you tire of it. Best of both worlds, I guess.

        Regardless, optional-with-default is the world we live in. We should standardize to serializing defaulted optional fields or not. It looks like consensus was forming for serializing them back in March. I'm happy to contribute patches if we can come to agreement on this.

        Show
        Nate Rosenblum added a comment - I'm sympathetic to the position that "optionally set with default" is a weird combination of attributes for a message field. I think the most compelling argument in favor is protocol flexibility: making the field default means you don't have to program to it directly & can change it globally at some later date (e.g. protocol version fields), while keeping it optional means that you can delete the field completely in the future if you tire of it. Best of both worlds, I guess. Regardless, optional-with-default is the world we live in. We should standardize to serializing defaulted optional fields or not. It looks like consensus was forming for serializing them back in March. I'm happy to contribute patches if we can come to agreement on this.
        Hide
        Nate Rosenblum added a comment -

        I've attached two patches to bring Java and C# defaulted-optional serialization in line with C++ and Python. The generated C# code looks like it's doing the right thing, but I am not an expert in that language so somebody should probably double check.

        Show
        Nate Rosenblum added a comment - I've attached two patches to bring Java and C# defaulted-optional serialization in line with C++ and Python. The generated C# code looks like it's doing the right thing, but I am not an expert in that language so somebody should probably double check.
        Hide
        Jens Geyer added a comment - - edited

        Nate,

        I just added a few sub-Tasks for some of the affected languages (we probably need another one for Python, do we?). If you move the patches to their respective tickets, I will commit the C# patch.

        Show
        Jens Geyer added a comment - - edited Nate, I just added a few sub-Tasks for some of the affected languages (we probably need another one for Python, do we?). If you move the patches to their respective tickets, I will commit the C# patch.
        Hide
        Nate Rosenblum added a comment -

        I've moved the Java and C# patches to the respective subtasks. To the best of my knowledge, I believe that Python already behaves like C++, so it does not need to be patched. That's what I read from the initial issue, anyway. Somebody else can chime in.

        Show
        Nate Rosenblum added a comment - I've moved the Java and C# patches to the respective subtasks. To the best of my knowledge, I believe that Python already behaves like C++, so it does not need to be patched. That's what I read from the initial issue, anyway. Somebody else can chime in.
        Hide
        Josh Hansen added a comment -

        For my use case, serializing optional-with-default fields when they're unset wastes a huge amount of disk space. The behavior I expected was that optional fields would not be serialized unless they are explicitly set, and I was surprised to discover that in Java at least this isn't always the case.

        As Bryan Duxbury mentioned above, whether or not an optional-with-default-value field is serialized in Java actually depends on the datatype of the field. Consider this struct, for example:
        struct A

        { 1: optional bool a = true, 2: optional byte b = 4, 3: optional i16 c = 12, 4: optional i32 d = 49303, 5: optional i64 e = 584393, 6: optional double f = 5.5, 7: optional string g = "some value" }

        When I instantiate a plain A object and don't set any fields on it and write it using TJSONProtocol, I get
        {"7":{"str":"some value"}}
        This is a horribly counterintuitive behavior. Either all of those fields should be serialized, or none of them should be.

        My vote is that we regularize this behavior and then make it controllable via a thrift compiler flag, e.g. thrift --serialize-unset-optionals-with-default=[true|false]

        Show
        Josh Hansen added a comment - For my use case, serializing optional-with-default fields when they're unset wastes a huge amount of disk space. The behavior I expected was that optional fields would not be serialized unless they are explicitly set, and I was surprised to discover that in Java at least this isn't always the case. As Bryan Duxbury mentioned above, whether or not an optional-with-default-value field is serialized in Java actually depends on the datatype of the field. Consider this struct, for example: struct A { 1: optional bool a = true, 2: optional byte b = 4, 3: optional i16 c = 12, 4: optional i32 d = 49303, 5: optional i64 e = 584393, 6: optional double f = 5.5, 7: optional string g = "some value" } When I instantiate a plain A object and don't set any fields on it and write it using TJSONProtocol, I get {"7":{"str":"some value"}} This is a horribly counterintuitive behavior. Either all of those fields should be serialized, or none of them should be. My vote is that we regularize this behavior and then make it controllable via a thrift compiler flag, e.g. thrift --serialize-unset-optionals-with-default= [true|false]
        Hide
        James Haggerty added a comment -

        As Josh mentioned, there are significant advantages to not serialising default values.

        Say you have a large Struct with a lot of boolean options in it, but 90% of the time those options don't move from their defaults (normally False). To get around this issue, you could try to implement booleans defaulting to False by using optional without a default (i.e. counting !isset/null/None as false). But abusing things like this means you can't distinguish between fields that are null because they are not filled in and fields that are null because that represents false... which is a pain when you're custom encoding the struct as JSON for the end user.

        IMO it doesn't make much sense to have 'optional with default' unless this behaviour is supported.

        Show
        James Haggerty added a comment - As Josh mentioned, there are significant advantages to not serialising default values. Say you have a large Struct with a lot of boolean options in it, but 90% of the time those options don't move from their defaults (normally False). To get around this issue, you could try to implement booleans defaulting to False by using optional without a default (i.e. counting !isset/null/None as false). But abusing things like this means you can't distinguish between fields that are null because they are not filled in and fields that are null because that represents false... which is a pain when you're custom encoding the struct as JSON for the end user. IMO it doesn't make much sense to have 'optional with default' unless this behaviour is supported.
        Hide
        James Haggerty added a comment -

        I guess the other way of working around this is to explicitly unset any field with default values, since the receiver will set them all again anyway... (doesn't this seem like a waste of time to you?).

        It would be really useful to have this functionality. Maybe this could be a TProtocol option?

        Show
        James Haggerty added a comment - I guess the other way of working around this is to explicitly unset any field with default values, since the receiver will set them all again anyway... (doesn't this seem like a waste of time to you?). — It would be really useful to have this functionality. Maybe this could be a TProtocol option?
        Hide
        Nate Rosenblum added a comment - - edited

        From my perspective, the downside of not serializing optional-defaulted is that it ties the runtime state of the protocol to the IDL file itself. I certainly see the inefficiencies inherent in serializing these fields if one makes heavy use of them & doesn't change the default values. Could I go back in time, I would argue against allowing this functionality at all; it seems less error prone to encode this defaulting directly in application logic, rather than to make it part of the protocol specification. I suppose one could argue either way, according to preference.

        The problem that I originally ran into was binary compatibility of protocol messages, specifically for computing signatures over the serialized data. I think it's perfectly reasonable to make the serializing behavior configurable, as long as binary compatibility remains an option.

        Show
        Nate Rosenblum added a comment - - edited From my perspective, the downside of not serializing optional-defaulted is that it ties the runtime state of the protocol to the IDL file itself. I certainly see the inefficiencies inherent in serializing these fields if one makes heavy use of them & doesn't change the default values. Could I go back in time, I would argue against allowing this functionality at all; it seems less error prone to encode this defaulting directly in application logic, rather than to make it part of the protocol specification. I suppose one could argue either way, according to preference. The problem that I originally ran into was binary compatibility of protocol messages, specifically for computing signatures over the serialized data. I think it's perfectly reasonable to make the serializing behavior configurable, as long as binary compatibility remains an option.
        Hide
        James Haggerty added a comment -

        Yeah, I see the ugliness of defaults set on optionals (i.e. so sometimes the server sets its default, and sometimes the client sets its default, depending on who's receiving). But you have this problem regardless of whether you serialise defaults, you just flip the source of the definition (creator vs. receiver).

        It is nice for documentation, though, as it means that users can easily see what the default behaviour is (rather than you maintaining the comment for the field and the code separately).

        Show
        James Haggerty added a comment - Yeah, I see the ugliness of defaults set on optionals (i.e. so sometimes the server sets its default, and sometimes the client sets its default, depending on who's receiving). But you have this problem regardless of whether you serialise defaults, you just flip the source of the definition (creator vs. receiver). It is nice for documentation, though, as it means that users can easily see what the default behaviour is (rather than you maintaining the comment for the field and the code separately).

          People

          • Assignee:
            Unassigned
            Reporter:
            Stefan Gmeiner
          • Votes:
            2 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:

              Development