Uploaded image for project: 'Accumulo'
  1. Accumulo
  2. ACCUMULO-2589 Create new client API
  3. ACCUMULO-2445

API should use collection-of-bytes representation instead of Text

    Details

    • Type: Sub-task
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.5.1, 1.6.0
    • Fix Version/s: 2.0.0
    • Component/s: client
    • Labels:
      None

      Description

      Right now we use Text as a container for bytes rather than as a UTF8 representation of a string. This results in some gymnastics when people wish to use byte sequences that are not valid UTF8 (see ACCUMULO-2437).

      We should use some representation of plain bytes (ideally byte[] or ByteBuffer) and deprecate the use of Text so we can remove it later.

        Activity

        Hide
        ctubbsii Christopher Tubbs added a comment -

        Big +1 to Josh Elser's last comment. I think we want to move cautiously here.

        Fluo Bytes is a WIP, that could be suitable to replace Text in future API changes, but it's not yet done. Accumulo has its internal ByteSequence, and Fluo has its own current Bytes, and there are others. I think Fluo Bytes will be a good candidate in the future, but in the meantime, I don't know that we have any good options that aren't just going to create churn.

        Show
        ctubbsii Christopher Tubbs added a comment - Big +1 to Josh Elser 's last comment. I think we want to move cautiously here. Fluo Bytes is a WIP, that could be suitable to replace Text in future API changes, but it's not yet done. Accumulo has its internal ByteSequence, and Fluo has its own current Bytes, and there are others. I think Fluo Bytes will be a good candidate in the future, but in the meantime, I don't know that we have any good options that aren't just going to create churn.
        Hide
        elserj Josh Elser added a comment -

        Ahhhh, great. That makes more sense.

        I'll leave you to it then! If you'll accept some unsolicited advice, I'd take some time to digest that blog-post about the idea of fluo-bytes before diving into changing Accumulo client API. A few "feet" of planning saves "miles" in these kinds of scenarios.

        Show
        elserj Josh Elser added a comment - Ahhhh, great. That makes more sense. I'll leave you to it then! If you'll accept some unsolicited advice, I'd take some time to digest that blog-post about the idea of fluo-bytes before diving into changing Accumulo client API. A few "feet" of planning saves "miles" in these kinds of scenarios.
        Hide
        milleruntime Michael Miller added a comment -

        I spoke with Christopher Tubbs and figured it out. Fluo-bytes is WIP with the idea of becoming a stand alone project but currently exists as a branch in his fork.

        I was thinking we could make this change to the code in parallel with the new API but looks like its probably not worth it right now.

        Show
        milleruntime Michael Miller added a comment - I spoke with Christopher Tubbs and figured it out. Fluo-bytes is WIP with the idea of becoming a stand alone project but currently exists as a branch in his fork. I was thinking we could make this change to the code in parallel with the new API but looks like its probably not worth it right now.
        Hide
        elserj Josh Elser added a comment -

        I'm confused – fluo-bytes is empty? There was never actually any code? Just an idea?

        I would just be concerned about using some class that we provide directly without making sure it's really fleshed out. Mentioning fluo was just because I assumed that such a "stable" bytes-centric API was already existing which we could leverage. If that doesn't exist, we would just have to make sure what we have now is stable/thorough.

        Show
        elserj Josh Elser added a comment - I'm confused – fluo-bytes is empty? There was never actually any code? Just an idea? I would just be concerned about using some class that we provide directly without making sure it's really fleshed out. Mentioning fluo was just because I assumed that such a "stable" bytes-centric API was already existing which we could leverage. If that doesn't exist, we would just have to make sure what we have now is stable/thorough.
        Hide
        milleruntime Michael Miller added a comment -

        Maybe you were thinking of ByteSequence in Accumulo? Specifically ArrayByteSequence which appears to be immutable... we also have a MutableByteSequence. Keith Turner recently fixed a bug with ByteBuffer in ArrayByteSequence, see ACCUMULO-4113. Perhaps ArrayByteSequence is the best alternative we currently have for Text.

        Show
        milleruntime Michael Miller added a comment - Maybe you were thinking of ByteSequence in Accumulo? Specifically ArrayByteSequence which appears to be immutable... we also have a MutableByteSequence. Keith Turner recently fixed a bug with ByteBuffer in ArrayByteSequence, see ACCUMULO-4113 . Perhaps ArrayByteSequence is the best alternative we currently have for Text.
        Hide
        milleruntime Michael Miller added a comment -

        Yeah the Fluo guys created this to fill the gap: https://github.com/apache/fluo-bytes

        Show
        milleruntime Michael Miller added a comment - Yeah the Fluo guys created this to fill the gap: https://github.com/apache/fluo-bytes
        Hide
        elserj Josh Elser added a comment -

        I can echo Christopher's comments about ByteBuffer being a pain to work with. I wouldn't want to push that onto users as the only option. Didn't Keith Turner make some kind of helper class for bytes that we could leverage?

        I did want to mention: there is a bit of value that we could recognize server-side by using ByteBuffers, notably around the ability to create "off-heap" (direct) ByteBuffers. Making this kind of change internally would pave the way for future improvements which can help limit the impact of JVM GC and improve our overall latency.

        Show
        elserj Josh Elser added a comment - I can echo Christopher's comments about ByteBuffer being a pain to work with. I wouldn't want to push that onto users as the only option. Didn't Keith Turner make some kind of helper class for bytes that we could leverage? I did want to mention: there is a bit of value that we could recognize server-side by using ByteBuffers, notably around the ability to create "off-heap" (direct) ByteBuffers. Making this kind of change internally would pave the way for future improvements which can help limit the impact of JVM GC and improve our overall latency.
        Hide
        milleruntime Michael Miller added a comment -

        I was starting to look into this and changing Text over to byte[] seems to be simpler than ByteBuffer. I was only looking at KeyExtent though, which seemed to be a good place to start.

        Show
        milleruntime Michael Miller added a comment - I was starting to look into this and changing Text over to byte[] seems to be simpler than ByteBuffer. I was only looking at KeyExtent though, which seemed to be a good place to start.
        Hide
        mgrimmer Martin Grimmer added a comment -

        Got into the mentioned issues, so this would help writing applications more "clear".

        Show
        mgrimmer Martin Grimmer added a comment - Got into the mentioned issues, so this would help writing applications more "clear".
        Hide
        ctubbsii Christopher Tubbs added a comment -

        After using ByteBuffer a bit, I think it is extremely unfriendly to work with. I'm leaning towards something a bit simpler, more like our current ByteSequence. Of course, making it easily convertible to/from ByteBuffer would be helpful, so I'd want to do that. What do you guys think?

        Show
        ctubbsii Christopher Tubbs added a comment - After using ByteBuffer a bit, I think it is extremely unfriendly to work with. I'm leaning towards something a bit simpler, more like our current ByteSequence. Of course, making it easily convertible to/from ByteBuffer would be helpful, so I'd want to do that. What do you guys think?
        Hide
        vines John Vines added a comment -

        +1 ByteBuffer

        Show
        vines John Vines added a comment - +1 ByteBuffer
        Hide
        busbey Sean Busbey added a comment -

        Strong preference given for ByteBuffer

        Show
        busbey Sean Busbey added a comment - Strong preference given for ByteBuffer

          People

          • Assignee:
            ctubbsii Christopher Tubbs
            Reporter:
            busbey Sean Busbey
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:

              Development