HBase
  1. HBase
  2. HBASE-5958

Replace ByteString.copyFrom with new ByteString()

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Invalid
    • Affects Version/s: 0.95.2
    • Fix Version/s: None
    • Component/s: Performance
    • Labels:
      None

      Description

      ByteString.copyFrom makes a copy of a byte array in case it is changed in other thread.
      In most case, we don't need to worry about that. We should avoid copying the bytes
      for performance issue.

        Issue Links

          Activity

          Hide
          Todd Lipcon added a comment -

          Hi Jimmy. The constructor you referenced above is private... are you suggesting using reflection to access it or something?

          Show
          Todd Lipcon added a comment - Hi Jimmy. The constructor you referenced above is private... are you suggesting using reflection to access it or something?
          Hide
          Jimmy Xiang added a comment -

          Too bad. I didn't realize it is private. Why is that? It is immutable any way, right?

          Show
          Jimmy Xiang added a comment - Too bad. I didn't realize it is private. Why is that? It is immutable any way, right?
          Hide
          Todd Lipcon added a comment -

          The protobuf people have a strong reticence to allowing ByteString construction without a copy.

          One solution would be to switch to protostuff, which does allow this kind of thing, but would be a bigger change. (though it's still wire-compatible)

          Show
          Todd Lipcon added a comment - The protobuf people have a strong reticence to allowing ByteString construction without a copy. One solution would be to switch to protostuff, which does allow this kind of thing, but would be a bigger change. (though it's still wire-compatible)
          Hide
          Gregory Chanan added a comment -

          Did you look at this Jimmy?

          http://comments.gmane.org/gmane.comp.lib.protocol-buffers.general/6406

          "But yes, if you start with a byte[], and you want a ByteString with the same content, you are going to need to make a copy, because ByteString has to guarantee immutability."

          Show
          Gregory Chanan added a comment - Did you look at this Jimmy? http://comments.gmane.org/gmane.comp.lib.protocol-buffers.general/6406 "But yes, if you start with a byte[], and you want a ByteString with the same content, you are going to need to make a copy, because ByteString has to guarantee immutability."
          Hide
          stack added a comment -

          Boo to buffer copies!

          Show
          stack added a comment - Boo to buffer copies!
          Hide
          Todd Lipcon added a comment -

          In my mind the two possible solutions are :
          1) use protostuff
          2) use my earlier idea of a "data dictionary" at the front of our RPC requests. Essentially, we'd change the RPC request/response mechanism so that before each message (or perhaps after), we send a list of KeyValues (or byte strings). Then in the actual Put/Result/whatever protos, we just use vint32s to refer back to the dictionary. That would allow us to manually send out the KeyValues using a CodedOutputStream rather than having to build a full protobuf which includes them, if that makes sense.

          Show
          Todd Lipcon added a comment - In my mind the two possible solutions are : 1) use protostuff 2) use my earlier idea of a "data dictionary" at the front of our RPC requests. Essentially, we'd change the RPC request/response mechanism so that before each message (or perhaps after), we send a list of KeyValues (or byte strings). Then in the actual Put/Result/whatever protos, we just use vint32s to refer back to the dictionary. That would allow us to manually send out the KeyValues using a CodedOutputStream rather than having to build a full protobuf which includes them, if that makes sense.
          Hide
          stack added a comment -

          Looks like adding protostuff could help elsewhere; it looks like we can generate classes as part of the build w/o having to rely on external pb compiler being on the build path.

          Shouldn't we do data dictionary whether protostuff or not? Seems like a generally good idea (tm)

          Show
          stack added a comment - Looks like adding protostuff could help elsewhere; it looks like we can generate classes as part of the build w/o having to rely on external pb compiler being on the build path. Shouldn't we do data dictionary whether protostuff or not? Seems like a generally good idea (tm)
          Hide
          Elliott Clark added a comment -

          Protostuff is usually a little bit faster as well. https://github.com/eishay/jvm-serializers/wiki/Home/25fd014e66738268670adaf44ff5408ba2244d37

          I haven't personally run those benchmarks in a while and it looks like the most recent are not up yet. But still something to consider.

          Show
          Elliott Clark added a comment - Protostuff is usually a little bit faster as well. https://github.com/eishay/jvm-serializers/wiki/Home/25fd014e66738268670adaf44ff5408ba2244d37 I haven't personally run those benchmarks in a while and it looks like the most recent are not up yet. But still something to consider.
          Hide
          Jimmy Xiang added a comment -

          KeyValue is kind of immutable. Can we make KeyValue based on ByteString, instead of byte[]? If so, we can avoid copying around too.

          Show
          Jimmy Xiang added a comment - KeyValue is kind of immutable. Can we make KeyValue based on ByteString, instead of byte[]? If so, we can avoid copying around too.
          Hide
          Jimmy Xiang added a comment -

          By the way, is it an option to use reflection to access the private constructor? If so, I can have a wrap method to use the private constructor, or the original copyFrom if the private constructor is not accessible. Reflection has overhead of course.

          Show
          Jimmy Xiang added a comment - By the way, is it an option to use reflection to access the private constructor? If so, I can have a wrap method to use the private constructor, or the original copyFrom if the private constructor is not accessible. Reflection has overhead of course.
          Hide
          Todd Lipcon added a comment -

          Makes me nervous to reach in and use the private constructor... do you have some benchmarks that show that there's a noticeable speedup by doing so?

          Show
          Todd Lipcon added a comment - Makes me nervous to reach in and use the private constructor... do you have some benchmarks that show that there's a noticeable speedup by doing so?
          Hide
          Jimmy Xiang added a comment -

          I don't have a benchmark yet. Based on profiling, copyFrom and toByteArray are some hotspots.

          Show
          Jimmy Xiang added a comment - I don't have a benchmark yet. Based on profiling, copyFrom and toByteArray are some hotspots.
          Hide
          Jimmy Xiang added a comment -

          Close it as Invalid.

          Show
          Jimmy Xiang added a comment - Close it as Invalid.

            People

            • Assignee:
              Jimmy Xiang
              Reporter:
              Jimmy Xiang
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development