Thrift
  1. Thrift
  2. THRIFT-1394

Treatment of optional fields is not consistent between C++ and Java

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.7
    • Fix Version/s: 0.9
    • Labels:
    • Environment:

      Mac OSX, Thrift trunk

      Description

      Motivation

      We are trying to implement message signing in the application layer for an internal project. The transport is over Thrift – the server side is using Java and the client side is using C++. We noticed that the message signing would fail because the client and server would serialize the same Thrift object in different ways.

      Problem

      The semantics of "optional" fields differ between thrift code generated for Java and CPP. In CPP, all optional fields are guarded by the isset helper struct. On Java, however, the generated code takes advantage of nullable types: for containers, structs, exceptions, enums, and, notably, strings, the generator elides explicit use of an "isset" bit vector and instead emits checks of the form "field null". This leads to varying behavior between the two languages: an optional string field with a default value will have isset[fieldid] false on C, but the equivalent test in Java will be true.

      To be concrete, consider the following example:

      struct Foo {
          1: optional string bar = "hello";
          2: optional i32 baz;
      }
      

      In C++, the checks for 'bar' and 'i' are similar:

        if (this->__isset.bar) {
          xfer += oprot->writeFieldBegin("bar", ::apache::thrift::protocol::T_STRING, 1);
          xfer += oprot->writeString(this->bar);
          xfer += oprot->writeFieldEnd();
        }
        if (this->__isset.baz) {
          xfer += oprot->writeFieldBegin("baz", ::apache::thrift::protocol::T_I32, 2);
          xfer += oprot->writeI32(this->baz);
          xfer += oprot->writeFieldEnd();
        }
      

      However, in Java, the checks are different:

        /** Returns true if field bar is set (has been assigned a value) and false otherwise */
        public boolean isSetBar() {
          return this.bar != null;
        }
        /** Returns true if field baz is set (has been assigned a value) and false otherwise */
        public boolean isSetBaz() {
          return __isset_bit_vector.get(__BAZ_ISSET_ID);
        }
      

      As a result, when the Java object is serialized, the value for 'bar' is included. But when the C++ object is serialized, the value for 'bar' is NOT included.

      For a system like Thrift, I would eschew a few bytes saved over correctness and reliability. As a user of Thrift, I do expect that the wire data generated for identical Thrift objects will be identical, regardless of the language used.

      Proposal

      We already use a BitSet to track primitive types in Java. The compiler should extend the bit vector to also guard nullable types, to be consistent with C++. This is pretty easy and low impact – I'm happy to provide a patch.

      1. THRIFT-1394.patch
        2 kB
        Diwaker Gupta

        Issue Links

          Activity

          Henrique Mendonça made changes -
          Link This issue is duplicated by THRIFT-1862 [ THRIFT-1862 ]
          Hide
          Hudson added a comment -

          Integrated in Thrift #398 (See https://builds.apache.org/job/Thrift/398/)
          THRIFT-1394:Treatment of optional fields is not consistent between C++ and Java
          Client: cpp
          Patch: Diwaker Gupta

          In CPP, all optional fields are guarded by the isset helper struct. On Java, however, the generated code takes advantage of nullable types: for containers, structs, exceptions, enums, and, notably, strings, the generator elides explicit use of an "isset" bit vector and instead emits checks of the form "field null". This leads to varying behavior between the two languages: an optional string field with a default value will have isset[fieldid] false on C, but the equivalent test in Java will be true.

          jfarrell : http://svn.apache.org/viewvc/?view=rev&rev=1236529
          Files :

          • /thrift/trunk/compiler/cpp/src/generate/t_cpp_generator.cc
          • /thrift/trunk/lib/cpp/test/OptionalRequiredTest.cpp
          • /thrift/trunk/test/OptionalRequiredTest.thrift
          Show
          Hudson added a comment - Integrated in Thrift #398 (See https://builds.apache.org/job/Thrift/398/ ) THRIFT-1394 :Treatment of optional fields is not consistent between C++ and Java Client: cpp Patch: Diwaker Gupta In CPP, all optional fields are guarded by the isset helper struct. On Java, however, the generated code takes advantage of nullable types: for containers, structs, exceptions, enums, and, notably, strings, the generator elides explicit use of an "isset" bit vector and instead emits checks of the form "field null". This leads to varying behavior between the two languages: an optional string field with a default value will have isset [fieldid] false on C, but the equivalent test in Java will be true. jfarrell : http://svn.apache.org/viewvc/?view=rev&rev=1236529 Files : /thrift/trunk/compiler/cpp/src/generate/t_cpp_generator.cc /thrift/trunk/lib/cpp/test/OptionalRequiredTest.cpp /thrift/trunk/test/OptionalRequiredTest.thrift
          Jake Farrell made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Jake Farrell made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Assignee Jake Farrell [ jfarrell ] Diwaker Gupta [ diwaker ]
          Fix Version/s 0.9 [ 12316294 ]
          Resolution Fixed [ 1 ]
          Hide
          Jake Farrell added a comment -

          Committed

          Show
          Jake Farrell added a comment - Committed
          Hide
          Diwaker Gupta added a comment -

          Bump. Jake?

          Show
          Diwaker Gupta added a comment - Bump. Jake?
          Hide
          Diwaker Gupta added a comment -

          Jake, considering that the current C++ <-> Java wire translation is inconsistent and likely to be inconsistent across other languages as well AND considering that this hasn't been a huge issue in the community so far, my sense is that most Thrift users just haven't run into this.

          Given our limited resources, we are unable to provide patches for other client libraries. But I don't see why that should prevent this patch from going in – it passes all tests and doesn't break any existing clients and/or functionality.

          What would it take to get this patch into the next release?

          Show
          Diwaker Gupta added a comment - Jake, considering that the current C++ <-> Java wire translation is inconsistent and likely to be inconsistent across other languages as well AND considering that this hasn't been a huge issue in the community so far, my sense is that most Thrift users just haven't run into this. Given our limited resources, we are unable to provide patches for other client libraries. But I don't see why that should prevent this patch from going in – it passes all tests and doesn't break any existing clients and/or functionality. What would it take to get this patch into the next release?
          Hide
          Jake Farrell added a comment -

          Diwaker, this patch fixes the immediate problem you have with c++ and java, but what about other client libraries (ruby, python, php, etc..) ?

          Show
          Jake Farrell added a comment - Diwaker, this patch fixes the immediate problem you have with c++ and java, but what about other client libraries (ruby, python, php, etc..) ?
          Diwaker Gupta made changes -
          Attachment THRIFT-1394.patch [ 12499604 ]
          Diwaker Gupta made changes -
          Field Original Value New Value
          Description *Motivation*

          We are trying to implement message signing in the application layer for an internal project. The transport is over Thrift -- the server side is using Java and the client side is using C++. We noticed that the message signing would fail because the client and server would serialize the same Thrift object in different ways.

          *Problem*

          The semantics of "optional" fields differ between thrift code generated for Java and CPP. In CPP, all optional fields are guarded by the {{isset}} helper struct. On Java, however, the generated code takes advantage of nullable types: for containers, structs, exceptions, enums, and, notably, strings, the generator elides explicit use of an "isset" bit vector and instead emits checks of the form "field null". This leads to varying behavior between the two languages: an optional string field with a default value will have {{isset[fieldid]}} false on C, but the equivalent test in Java will be true.

          To be concrete, consider the following example:

          {code}
          struct Bar {
              1: required i32 foo;
          }

          struct Foo {
              1: optional Bar bar;
              2: optional i32 i;
          }
          {code}

          In C++, the checks for 'bar' and 'i' are similar:

          {code}
            if (this->__isset.bar) {
              xfer += oprot->writeFieldBegin("bar", ::apache::thrift::protocol::T_STRUCT, 1);
              xfer += this->bar.write(oprot);
              xfer += oprot->writeFieldEnd();
            }
            if (this->__isset.i) {
              xfer += oprot->writeFieldBegin("i", ::apache::thrift::protocol::T_I32, 2);
              xfer += oprot->writeI32(this->i);
              xfer += oprot->writeFieldEnd();
            }
          {code}

          However, in Java, the checks are different:

          {code}
            public boolean isSetI() {
              return __isset_bit_vector.get(__I_ISSET_ID);
            }

            /** Returns true if field bar is set (has been assigned a value) and false otherwise */
            public boolean isSetBar() {
              return this.bar != null;
            }
          {code}

          For a system like Thrift, I would eschew a few bytes saved over correctness and reliability. As a user of Thrift, I *do* expect that the wire data generated for identical Thrift objects will be identical, regardless of the language used.

          *Proposal*

          We already use a BitSet to track primitive types in Java. The compiler should extend the bit vector to also guard nullable types, to be consistent with C++. This is pretty easy and low impact -- I'm happy to provide a patch.
          *Motivation*

          We are trying to implement message signing in the application layer for an internal project. The transport is over Thrift -- the server side is using Java and the client side is using C++. We noticed that the message signing would fail because the client and server would serialize the same Thrift object in different ways.

          *Problem*

          The semantics of "optional" fields differ between thrift code generated for Java and CPP. In CPP, all optional fields are guarded by the {{isset}} helper struct. On Java, however, the generated code takes advantage of nullable types: for containers, structs, exceptions, enums, and, notably, strings, the generator elides explicit use of an "isset" bit vector and instead emits checks of the form "field null". This leads to varying behavior between the two languages: an optional string field with a default value will have {{isset[fieldid]}} false on C, but the equivalent test in Java will be true.

          To be concrete, consider the following example:

          {code}
          struct Foo {
              1: optional string bar = "hello";
              2: optional i32 baz;
          }
          {code}

          In C++, the checks for 'bar' and 'i' are similar:

          {code}
            if (this->__isset.bar) {
              xfer += oprot->writeFieldBegin("bar", ::apache::thrift::protocol::T_STRING, 1);
              xfer += oprot->writeString(this->bar);
              xfer += oprot->writeFieldEnd();
            }
            if (this->__isset.baz) {
              xfer += oprot->writeFieldBegin("baz", ::apache::thrift::protocol::T_I32, 2);
              xfer += oprot->writeI32(this->baz);
              xfer += oprot->writeFieldEnd();
            }
          {code}

          However, in Java, the checks are different:

          {code}
            /** Returns true if field bar is set (has been assigned a value) and false otherwise */
            public boolean isSetBar() {
              return this.bar != null;
            }
            /** Returns true if field baz is set (has been assigned a value) and false otherwise */
            public boolean isSetBaz() {
              return __isset_bit_vector.get(__BAZ_ISSET_ID);
            }
          {code}

          As a result, when the Java object is serialized, the value for 'bar' _is_ included. But when the C++ object is serialized, the value for 'bar' is NOT included.

          For a system like Thrift, I would eschew a few bytes saved over correctness and reliability. As a user of Thrift, I *do* expect that the wire data generated for identical Thrift objects will be identical, regardless of the language used.

          *Proposal*

          We already use a BitSet to track primitive types in Java. The compiler should extend the bit vector to also guard nullable types, to be consistent with C++. This is pretty easy and low impact -- I'm happy to provide a patch.
          Diwaker Gupta created issue -

            People

            • Assignee:
              Diwaker Gupta
              Reporter:
              Diwaker Gupta
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development