Thrift
  1. Thrift
  2. THRIFT-1920 Binary Type
  3. THRIFT-830

Switch binary field implementation from byte[] to ByteBuffer

    Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.4
    • Labels:
      None

      Description

      Instead of using byte[] as the implementation for binary fields, let's use ByteBuffer.

      There's nothing that you can do with byte[] that you can't also do with ByteBuffer, and there are more things you can do with ByteBuffer. It opens the way for us to avoid needless buffer copies on serialization and deserialization. It gives us a generally accepted equals() and compareTo() implementation, so we don't have to have custom cases for that anymore.

      Making this change will probably cause more than a little bit of trauma, changing the method signatures in both TProtocol and generated code. It's possible that I could be persuaded to support a command line switch for producing old-style byte[] methods in some contexts, but I'd love not to waste time supporting suboptimal features.

      1. thrift-830.patch
        36 kB
        Bryan Duxbury

        Issue Links

          Activity

          Hide
          Bryan Duxbury added a comment -

          Here's my first go at this change. All the tests pass.

          Show
          Bryan Duxbury added a comment - Here's my first go at this change. All the tests pass.
          Hide
          Jonathan Ellis added a comment -

          Cassandra is testing this over in CASSANDRA-1324.

          Show
          Jonathan Ellis added a comment - Cassandra is testing this over in CASSANDRA-1324 .
          Hide
          Nate McCall added a comment - - edited

          After some initial experimentation patching these changes into Cassandra I have found some strange behavior.

          It appears the message is not being broken up correctly into component parts. When I examine the message to retrieve the key, it appears the whole message body is intact within the buffer. For example when I expect to receive "k1", the buffer contains the entire contents of a Cassandra insert:

          [0, 0, 0, 6, 105, 110, 115, 101, 114, 116, 1, 0, 0, 0, 4, 11, 0, 1, 0, 0, 0, 2, 107, 48, 12, 0, 2, 11, 0, 3, 0, 0, 0, 8, 73, 110, 100, 101, 120, 101, 100, 49, 0, 12, 0, 3, 11, 0, 1, 0, 0, 0, 9, 98, 105, 114, 116, 104, 100, 97, 116, 101, 11, 0, 2, 0, 0, 0, 8, 49, 57, 55, 53, 49, 50, 51, 48, 12, 0, 3, 10, 0, 1, 0, 4, -116, -127, 83, 19, -95, -32, 0, 0, 8, 0, 4, 0, 0, 0, 1, 0]

          Where previously, it was simply [107,49]

          I will try to distill this into a reproducable test case of some sort, but hopefully the above can at least provide some information.

          Edit: this is with TBinaryProtocol and TFramedTransport

          Show
          Nate McCall added a comment - - edited After some initial experimentation patching these changes into Cassandra I have found some strange behavior. It appears the message is not being broken up correctly into component parts. When I examine the message to retrieve the key, it appears the whole message body is intact within the buffer. For example when I expect to receive "k1", the buffer contains the entire contents of a Cassandra insert: [0, 0, 0, 6, 105, 110, 115, 101, 114, 116, 1, 0, 0, 0, 4, 11, 0, 1, 0, 0, 0, 2, 107, 48, 12, 0, 2, 11, 0, 3, 0, 0, 0, 8, 73, 110, 100, 101, 120, 101, 100, 49, 0, 12, 0, 3, 11, 0, 1, 0, 0, 0, 9, 98, 105, 114, 116, 104, 100, 97, 116, 101, 11, 0, 2, 0, 0, 0, 8, 49, 57, 55, 53, 49, 50, 51, 48, 12, 0, 3, 10, 0, 1, 0, 4, -116, -127, 83, 19, -95, -32, 0, 0, 8, 0, 4, 0, 0, 0, 1, 0] Where previously, it was simply [107,49] I will try to distill this into a reproducable test case of some sort, but hopefully the above can at least provide some information. Edit: this is with TBinaryProtocol and TFramedTransport
          Hide
          Jonathan Ellis added a comment -

          right, the idea is that we'll use the ByteBuffer offset and limit to avoid having to copy out each binary field into a new byte[]

          Show
          Jonathan Ellis added a comment - right, the idea is that we'll use the ByteBuffer offset and limit to avoid having to copy out each binary field into a new byte[]
          Hide
          Bryan Duxbury added a comment -

          I think Johnathan beat me to the punch, but generally, you can use array() to get the underlying array, and then the contents between .position() and .limit(). The entire message will be in the underlying array, but you're just supposed to not look at it

          Show
          Bryan Duxbury added a comment - I think Johnathan beat me to the punch, but generally, you can use array() to get the underlying array, and then the contents between .position() and .limit(). The entire message will be in the underlying array, but you're just supposed to not look at it
          Hide
          Nate McCall added a comment -

          Wow, I missed that part completely. Sorry guys. I'll let you know how it turns out after I account for the above.

          Show
          Nate McCall added a comment - Wow, I missed that part completely. Sorry guys. I'll let you know how it turns out after I account for the above.
          Hide
          Bryan Duxbury added a comment -

          Nate - has your testing progressed far enough that you think this should be committed?

          Show
          Bryan Duxbury added a comment - Nate - has your testing progressed far enough that you think this should be committed?
          Hide
          Nate McCall added a comment -

          I've only implemented this partially (simple inserts and gets), but it "worked" for that. I'll be back on this next week and will let you know immediately if anything comes up.

          If you want to commit it, this would help us somewhat coordinating with THRIFT-831 (could serve as a baseline for those patches until that is in), but thats totally up to you.

          Show
          Nate McCall added a comment - I've only implemented this partially (simple inserts and gets), but it "worked" for that. I'll be back on this next week and will let you know immediately if anything comes up. If you want to commit it, this would help us somewhat coordinating with THRIFT-831 (could serve as a baseline for those patches until that is in), but thats totally up to you.
          Hide
          David Reiss added a comment -

          LGTM. This will be a big improvement.

          Show
          David Reiss added a comment - LGTM. This will be a big improvement.
          Hide
          Mathias Herberts added a comment -

          For whatever it's worth, I don't feel comfortable with this issue, replacing byte[] with ByteBuffer will break tons of code, thus making andry quite a few people who are already coping with thrift betaness. Not good for adoption.

          And on a more personal point of view it will also break the GWT compatibility patch I've crafted for Thrift as GWT does not have ByteBuffer in its JRE emulation layer.

          Therefore -1.

          Show
          Mathias Herberts added a comment - For whatever it's worth, I don't feel comfortable with this issue, replacing byte[] with ByteBuffer will break tons of code, thus making andry quite a few people who are already coping with thrift betaness. Not good for adoption. And on a more personal point of view it will also break the GWT compatibility patch I've crafted for Thrift as GWT does not have ByteBuffer in its JRE emulation layer. Therefore -1.
          Hide
          Bryan Duxbury added a comment -

          I recognize that it will break a lot of code. Luckily, that breakage is pretty trivial, though. Also, we're making this change at a version boundary, so we can announce the change cautiously.

          As far as the GWT stuff goes, would it make sense to spin that off into a separate generator that inherits from the standard Java generator, and therefore can make its own adjustments?

          Show
          Bryan Duxbury added a comment - I recognize that it will break a lot of code. Luckily, that breakage is pretty trivial, though. Also, we're making this change at a version boundary, so we can announce the change cautiously. As far as the GWT stuff goes, would it make sense to spin that off into a separate generator that inherits from the standard Java generator, and therefore can make its own adjustments?
          Hide
          Mathias Herberts added a comment -

          The way my GWT extension is done is by generating special stripped down POJOs intended to be used on the GWT client side and extra methods in regular Java generated classes to convert to/from those stripped down POJOs.

          Additionaly all the GWT service interfaces (both async and sync) are also generated thus speeding up development.

          Thus my GWT extension is intended more as a bridge than a standalone version of generated code.

          I'll make it evolve so the GWT POJOs still use byte[] and the conversion layer correctly converts to/from ByteBuffer to byte[].

          I retract my -1, if it improves the 'vanilla' java code, I'll adapt.

          My -1 was just me being lazy...

          Show
          Mathias Herberts added a comment - The way my GWT extension is done is by generating special stripped down POJOs intended to be used on the GWT client side and extra methods in regular Java generated classes to convert to/from those stripped down POJOs. Additionaly all the GWT service interfaces (both async and sync) are also generated thus speeding up development. Thus my GWT extension is intended more as a bridge than a standalone version of generated code. I'll make it evolve so the GWT POJOs still use byte[] and the conversion layer correctly converts to/from ByteBuffer to byte[]. I retract my -1, if it improves the 'vanilla' java code, I'll adapt. My -1 was just me being lazy...
          Hide
          Bryan Duxbury added a comment -

          I just committed this.

          Show
          Bryan Duxbury added a comment - I just committed this.

            People

            • Assignee:
              Bryan Duxbury
              Reporter:
              Bryan Duxbury
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development