Thrift
  1. Thrift
  2. THRIFT-872

Add 'primitive' option to 'Java' code generator

    Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.4
    • Fix Version/s: None
    • Component/s: Java - Compiler
    • Labels:
      None
    • Patch Info:
      Patch Available

      Description

      I'm attaching a patch that modifies 0.4.0 to add a new 'primitive' compiler option to the Java code generator that will produce a style of code that reduces library dependencies for generated struct and service interfaces. This will make the generated code easier to use in some contexts and more compatible with generated code from prior Thrift releases.

      The 'primitive' option changes generated code in the following ways:

      • Removes dependencies on: BitSet, ByteBuffer, and the third-party org.slf4j.Logger*
      • The 'isset' vector is implemented via boolean[] instead of BitSet
      • The 'Iface' interface for service 'Foo' is moved from an inner class to top-level FooIface.jar (and replaced with dummy Foo.Iface which just extends FooIface)
      • 'binary' fields from the IDL are changed from ByteBuffer back to byte[]

      The patch also includes runtime support library changes:

      • Added writeBinary(byte[]) and readBytes() methods to TProtocol to read and write byte[] primitives
      • Added TBaseHelper.toString(byte[],StringBuilder)

      To use (e.g.): thrift -r --gen java:beans,primitive Foo.thrift

      Rationale:
      1) The generated structures and services are more compatible with previous versions (in particular, v 0.2.0), requiring fewer code changes for existing projects upgrading to 0.4.0.
      2) The generated POJO structures and service interfaces have a much lower external dependency "footprint", so may be used more easily in platforms and libraries. For example, the structure .java files and the FooIface.java files may be used within restricted environments like GWT, which don't support java.nio.ByteBuffer, java.util.BitSet, or org.slf4j. (http://www.projectpossibility.org/projects/word_prediction/gwt-linux-1.4.60/doc/html/jre.html)
      3) In some contexts, 'binary' data fields may require fewer lines of code to juggle and serialize slightly faster.

      1. java-primitive-872.patch
        16 kB
        Dave Engberg
      2. java-primitive-872-v2.patch
        17 kB
        Dave Engberg
      3. java-primitive-872-v3.patch
        17 kB
        Dave Engberg

        Activity

        Hide
        Bryan Duxbury added a comment -

        What about making TProtocol's readBinary method non-abstract like this:

        public final byte[] readBytes() {
          ByteBuffer buf = readBinary();
          byte[] bytes = new byte[buf.capacity()];
          buf.get(buf);
          return buf;
        }
        

        I'm not crazy about having two separate implementations of readBinary in TBinary protocol. I'd like to find a way for them to share code if that's feasible.

        You've introduced some tabs into t_java_generator.

        Everything compiles fine and doesn't break any tests, so I'm inclined to believe things are all good. However, it's worth noting that essentially none of the changes you've made are covered by the existing tests.

        Show
        Bryan Duxbury added a comment - What about making TProtocol's readBinary method non-abstract like this: public final byte [] readBytes() { ByteBuffer buf = readBinary(); byte [] bytes = new byte [buf.capacity()]; buf.get(buf); return buf; } I'm not crazy about having two separate implementations of readBinary in TBinary protocol. I'd like to find a way for them to share code if that's feasible. You've introduced some tabs into t_java_generator. Everything compiles fine and doesn't break any tests, so I'm inclined to believe things are all good. However, it's worth noting that essentially none of the changes you've made are covered by the existing tests.
        Hide
        Dave Engberg added a comment -

        That would result in a fair amount of extra data copying. For example, the TBinaryProtocol.readBinary() method currently does this:

        public ByteBuffer readBinary() throws TException {
        int size = readI32();
        checkReadLength(size);
        if (trans_.getBytesRemainingInBuffer() >= size)

        { ByteBuffer bb = ByteBuffer.wrap(trans_.getBuffer(), trans_.getBufferPosition(), size); trans_.consumeBuffer(size); return bb; }

        byte[] buf = new byte[size];
        trans_.readAll(buf, 0, size);
        return ByteBuffer.wrap(buf);
        }

        So if you really just want to get a byte[] array out of the TProtocol object, you'd be creating the appropriate byte array once within readBinary(), then wrapping that up in a new ByteBuffer instance only to immediately create a duplicate byte[] and copying the data out. I'd be more inclined to go the other way, and have the default (non-abstract) readBinary() implementation invoke the byte[] version, since it can use the results without duplication. I'd originally considered doing so, but didn't want to step on toes, so left them separate.

        Sorry about the tabs ... my emacs modes have gotten screwed up in the months since the last time I touched C/C++ (i.e. the last time I did some stuff with Thrift). I can submit a replacement patch whenever other issues are consolidated.

        Show
        Dave Engberg added a comment - That would result in a fair amount of extra data copying. For example, the TBinaryProtocol.readBinary() method currently does this: public ByteBuffer readBinary() throws TException { int size = readI32(); checkReadLength(size); if (trans_.getBytesRemainingInBuffer() >= size) { ByteBuffer bb = ByteBuffer.wrap(trans_.getBuffer(), trans_.getBufferPosition(), size); trans_.consumeBuffer(size); return bb; } byte[] buf = new byte [size] ; trans_.readAll(buf, 0, size); return ByteBuffer.wrap(buf); } So if you really just want to get a byte[] array out of the TProtocol object, you'd be creating the appropriate byte array once within readBinary(), then wrapping that up in a new ByteBuffer instance only to immediately create a duplicate byte[] and copying the data out. I'd be more inclined to go the other way, and have the default (non-abstract) readBinary() implementation invoke the byte[] version, since it can use the results without duplication. I'd originally considered doing so, but didn't want to step on toes, so left them separate. Sorry about the tabs ... my emacs modes have gotten screwed up in the months since the last time I touched C/C++ (i.e. the last time I did some stuff with Thrift). I can submit a replacement patch whenever other issues are consolidated.
        Hide
        Dave Engberg added a comment -

        Replacement patch. Fixes the embarrassing \t tab chars, and fixes the 'compare' operation for binary fields with the primitive style:

        • unequal = "!this." + name + ".equals(that." + name + ")";
          + if (primitive_style_) { + unequal = "TBaseHelper.compareTo(this." + name + ", that." + + name + ") != 0"; + }

          else

          { + unequal = "!this." + name + ".equals(that." + name + ")"; + }
        Show
        Dave Engberg added a comment - Replacement patch. Fixes the embarrassing \t tab chars, and fixes the 'compare' operation for binary fields with the primitive style: unequal = "!this." + name + ".equals(that." + name + ")"; + if (primitive_style_) { + unequal = "TBaseHelper.compareTo(this." + name + ", that." + + name + ") != 0"; + } else { + unequal = "!this." + name + ".equals(that." + name + ")"; + }
        Hide
        Bryan Duxbury added a comment -

        That would result in a fair amount of extra data copying.

        I think that actually might not be true. If you're using a direct buffer access transport (Framed or MemoryInput, for instance), then the copy into the byte[] in readBinary() would actually be the first copy. Further, if the direct access path isn't taken, then the ByteBuffer you get back will necessarily be the exact size you want, so you should be able to just return readBinary().array().

        Your patch has the THRIFT-877 changes in it still. Do an svn up.

        How about we avoid the if in the equals/compareto areas by just having TBaseHelper called always? We can add a compareTo(ByteBuffer, ByteBuffer) that lets us call the same method regardless.

        I don't think that we should completely get rid of the logging of errors if we're in primitive style; how about we just write to System.err instead?

        Show
        Bryan Duxbury added a comment - That would result in a fair amount of extra data copying. I think that actually might not be true. If you're using a direct buffer access transport (Framed or MemoryInput, for instance), then the copy into the byte[] in readBinary() would actually be the first copy. Further, if the direct access path isn't taken, then the ByteBuffer you get back will necessarily be the exact size you want, so you should be able to just return readBinary().array(). Your patch has the THRIFT-877 changes in it still. Do an svn up. How about we avoid the if in the equals/compareto areas by just having TBaseHelper called always? We can add a compareTo(ByteBuffer, ByteBuffer) that lets us call the same method regardless. I don't think that we should completely get rid of the logging of errors if we're in primitive style; how about we just write to System.err instead?
        Hide
        Dave Engberg added a comment -

        Let's say that there are some narrow cases where it's only slightly less efficient than my proposal, and other cases (if you happen to use the wrong transport) where it can be much worse. (E.g. we accept binary fields up to 50MB, so copying that an extra time if we're using TBinaryTransport isn't desirable.) I.e. I wouldn't say that's worth reducing that class from

        I don't have a preference on the compareTo implementation ... we don't happen to use this.

        The generated code only had a single line of logging, which didn't seem to add much value at all (i.e. not enough to justify a runtime dependency on the slf4j library). The standard Java Logging framework would make a lot more sense than tightly coupling Thrift to slf4j, so if we really want to keep that single error logging line in the service generation, I'd push for making it use standard Java logging instead. System.err would be uglier. Either way, we'll have to go back to post-processing that line out of the generated code for client-style contexts where it doesn't make sense (e.g. use from GWT).

        Show
        Dave Engberg added a comment - Let's say that there are some narrow cases where it's only slightly less efficient than my proposal, and other cases (if you happen to use the wrong transport) where it can be much worse. (E.g. we accept binary fields up to 50MB, so copying that an extra time if we're using TBinaryTransport isn't desirable.) I.e. I wouldn't say that's worth reducing that class from I don't have a preference on the compareTo implementation ... we don't happen to use this. The generated code only had a single line of logging, which didn't seem to add much value at all (i.e. not enough to justify a runtime dependency on the slf4j library). The standard Java Logging framework would make a lot more sense than tightly coupling Thrift to slf4j, so if we really want to keep that single error logging line in the service generation, I'd push for making it use standard Java logging instead. System.err would be uglier. Either way, we'll have to go back to post-processing that line out of the generated code for client-style contexts where it doesn't make sense (e.g. use from GWT).
        Hide
        Dave Engberg added a comment -

        Rebuilt patch from the trunk after svn update.

        Show
        Dave Engberg added a comment - Rebuilt patch from the trunk after svn update.
        Hide
        Bryan Duxbury added a comment -

        I wouldn't say that's worth reducing that class from

        Was there an end to this sentence?

        Does this change overlap with the proposed JavaME generator?

        Show
        Bryan Duxbury added a comment - I wouldn't say that's worth reducing that class from Was there an end to this sentence? Does this change overlap with the proposed JavaME generator?
        Hide
        Dave Engberg added a comment -

        Oops, yeah.
        "I wouldn't say that the performance hit for many users would justify reducing TProtocol from 42 abstract methods to 41 abstract methods. If we really want to reduce the redundancy of the two methods, I'd have have a default implementation of the ByteBuffer version that uses the byte[] instead of vice versa."

        Show
        Dave Engberg added a comment - Oops, yeah. "I wouldn't say that the performance hit for many users would justify reducing TProtocol from 42 abstract methods to 41 abstract methods. If we really want to reduce the redundancy of the two methods, I'd have have a default implementation of the ByteBuffer version that uses the byte[] instead of vice versa."
        Hide
        Bryan Duxbury added a comment -

        Pushing this for now. Dave - please see my question about whether this overlaps with the JavaME generator. If so, let's just close this one. Otherwise, we can get it committed next release.

        Show
        Bryan Duxbury added a comment - Pushing this for now. Dave - please see my question about whether this overlaps with the JavaME generator. If so, let's just close this one. Otherwise, we can get it committed next release.
        Hide
        Dave Engberg added a comment -

        No, I don't think that a full J2SE/J2EE implementation can be cleanly merged with a JavaME implementation, since JavaSE is basically an ancient subset that then moved away from the original roots with its IO libraries, etc. There are no generics, no iterators, no 'enum', the only collections are Vector and Hashtable, etc.

        I thought about trying to put the JavaME generation into the core Java generator as conditional options (like I originally did with the JavaBean stuff), but the conditional logic everywhere would have made the generator very messy.

        Show
        Dave Engberg added a comment - No, I don't think that a full J2SE/J2EE implementation can be cleanly merged with a JavaME implementation, since JavaSE is basically an ancient subset that then moved away from the original roots with its IO libraries, etc. There are no generics, no iterators, no 'enum', the only collections are Vector and Hashtable, etc. I thought about trying to put the JavaME generation into the core Java generator as conditional options (like I originally did with the JavaBean stuff), but the conditional logic everywhere would have made the generator very messy.
        Hide
        Bryan Duxbury added a comment -

        I'm not talking about combining them. I'm asking, does the JavaME generator make the changes in this ticket unnecessary?

        Show
        Bryan Duxbury added a comment - I'm not talking about combining them. I'm asking, does the JavaME generator make the changes in this ticket unnecessary?
        Hide
        Dave Engberg added a comment -

        Evernote uses both independently. The JavaME generator and runtime library is used on phones from Blackberry, Sony-Ericsson, and some phones in Japan. The Java code is used server-side in J2SE, in a browser via Google Web Toolkit (GWT) compilation into JS, and on Android phones.

        So we'll continue to use both, but if a lower dependency footprint doesn't seem useful to others, we can just stick with our home-brewed 0.4.0++ forked generator and libraries without trying to stay in sync.

        Show
        Dave Engberg added a comment - Evernote uses both independently. The JavaME generator and runtime library is used on phones from Blackberry, Sony-Ericsson, and some phones in Japan. The Java code is used server-side in J2SE, in a browser via Google Web Toolkit (GWT) compilation into JS, and on Android phones. So we'll continue to use both, but if a lower dependency footprint doesn't seem useful to others, we can just stick with our home-brewed 0.4.0++ forked generator and libraries without trying to stay in sync.
        Hide
        Artur Biesiadowski added a comment -

        boolean[] for isset management is bit of a waste, memory-wise. This is probably complicating the implementation considerably, but maybe just expanded integer fields could be used to hold bits - something name like isset1, isset2, etc, with generator smart enough to access proper isset fields for given bit lookup? This would equal to somehow simplified BitSet implementation inlined in the java class.

        In fact, it could be beneficial even for the non-primitive part of the code...

        Show
        Artur Biesiadowski added a comment - boolean[] for isset management is bit of a waste, memory-wise. This is probably complicating the implementation considerably, but maybe just expanded integer fields could be used to hold bits - something name like isset1, isset2, etc, with generator smart enough to access proper isset fields for given bit lookup? This would equal to somehow simplified BitSet implementation inlined in the java class. In fact, it could be beneficial even for the non-primitive part of the code...
        Hide
        Dave Engberg added a comment -

        Yes, if you look at the standard implementation of java.util.BitSet and consider the extra object overhead in the runtime for BitSet+long[] vs. boolean[] , you'll need a lot of fields in your struct before BitSet uses less memory than a boolean[]. I.e. if I have a struct with 12 fields, the total memory used by the JVM for boolean[12] will be less than the corresponding BitSet with its reference to a long[1].

        Inline, named int fields would use less memory than either, however.

        Show
        Dave Engberg added a comment - Yes, if you look at the standard implementation of java.util.BitSet and consider the extra object overhead in the runtime for BitSet+long[] vs. boolean[] , you'll need a lot of fields in your struct before BitSet uses less memory than a boolean[]. I.e. if I have a struct with 12 fields, the total memory used by the JVM for boolean [12] will be less than the corresponding BitSet with its reference to a long [1] . Inline, named int fields would use less memory than either, however.
        Hide
        Bryan Duxbury added a comment -

        Dave - I'm looking at committing this patch, but it doesn't apply cleanly anymore. Any chance you'd be interested in updating it for trunk?

        Show
        Bryan Duxbury added a comment - Dave - I'm looking at committing this patch, but it doesn't apply cleanly anymore. Any chance you'd be interested in updating it for trunk?
        Hide
        Andrew Gaul added a comment -

        Does THRIFT-1469 close this issue?

        Show
        Andrew Gaul added a comment - Does THRIFT-1469 close this issue?
        Hide
        Dave Engberg added a comment -

        My patch was more specifically intended to reduce the runtime library
        dependencies introduced to the Java generated code from versions 0.3+
        These made it hard or impossible to use the Java code in older JRE
        platforms like Android, Blackberry, JME, etc., and increased the
        footprint of any binaries that used this code.

        Since that time, we've decided to just use a forked pre-0.4 compiler for
        all of our Java code generation since the design direction in more
        recent releases are diverging too significantly from what we need.
        We're still using the main trunk release for non-Java platforms in our
        distributed API package (http://www.evernote.com/about/developer/api/)
        ... we just generate them separately with different compiler binaries in
        our build process.

        So I don't think THRIFT-1469 addresses the same issue, but I'd also be
        fine if the maintainers just wanted to close THRIFT-872 if no one else
        is using Java in a non-heavy-server configuration.

        Thanks

        Show
        Dave Engberg added a comment - My patch was more specifically intended to reduce the runtime library dependencies introduced to the Java generated code from versions 0.3+ These made it hard or impossible to use the Java code in older JRE platforms like Android, Blackberry, JME, etc., and increased the footprint of any binaries that used this code. Since that time, we've decided to just use a forked pre-0.4 compiler for all of our Java code generation since the design direction in more recent releases are diverging too significantly from what we need. We're still using the main trunk release for non-Java platforms in our distributed API package ( http://www.evernote.com/about/developer/api/ ) ... we just generate them separately with different compiler binaries in our build process. So I don't think THRIFT-1469 addresses the same issue, but I'd also be fine if the maintainers just wanted to close THRIFT-872 if no one else is using Java in a non-heavy-server configuration. Thanks
        Hide
        Roger Meier added a comment -

        Dave,
        as mentioned by Bryan, please rebase the patch and we will commit it!

        Yes, size matters

        Show
        Roger Meier added a comment - Dave, as mentioned by Bryan, please rebase the patch and we will commit it! Yes, size matters
        Hide
        Harvey Bhatt added a comment -

        Can someone fix this? I want to use Hadoop and evernote api together and its impossible without doing weird classloader tricks with the incompatible thrift libraries or separating one of the libraries out to another server and using remote calls to execute commands.

        Show
        Harvey Bhatt added a comment - Can someone fix this? I want to use Hadoop and evernote api together and its impossible without doing weird classloader tricks with the incompatible thrift libraries or separating one of the libraries out to another server and using remote calls to execute commands.

          People

          • Assignee:
            Unassigned
            Reporter:
            Dave Engberg
          • Votes:
            3 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:

              Development