Accumulo
  1. Accumulo
  2. ACCUMULO-1551

Introduce Generic Supertypes to Replace Text

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      I wanted to create a new ticket for my thoughts on this. I'd like to introduce a paradigm similar to the object inspectors used in HIVE to get data in and out of accumulo.

      The base motivation for this is that the accumulo API is inconsistent. It is difficult to use for application developers and creates a lot of confusion to new developers because of the inconsistent use of Text, CharSequence, and byte[] for representing various parts of the keys. This is totally unnecessary and is in my mind a huge black eye.

      Aside from providing a mechanism that could eventually be used to increase read performance in the client, this would also provide a simpler paradigm for application developers and would accomplish some aspects of ORM, a-la the Typo and Gora (although distinct from the goals and scope of Gora).

      I've attached an initial pull request/code review outlining how I think the refactoring would work in scanner. Basically, the old API would be preserved by introducing generic supertypes, and a class that allows serialization directly from the ByteSequence objects.

      While it may be true that some people have highly heterogenous data in their table, the worst case scenario here is that you just use the ByteSequences directly. This will, however, allow substantially simpler access even in that base case by making the access pattern consistent. In other cases, where a scan is only done over a particular column, or the data is very homogenous, the benefit is even greater.

      https://github.com/ekohlwey/accumulo/compare/apache:trunk...ACCUMULO-1551

        Issue Links

          Activity

          Hide
          Keith Turner added a comment -

          I took a look at the changes. This API change builds around EntryConverter, which converts a Key+Value to an arbitrary Java object. As you mentioned this approach may not work well w/ heterogeneous data. Typo suffered from the same issue, however I think Typo was more rigid than EntryConverter. At first glance I think EntryConverter is an improvement over typo, it seems more flexible. I think some examples of using the API would make it easier to evaluate and understand it.

          Do you plan to address writing data?

          Does anyone know other APIs that abstract Accumulo's API? Typo and Gora were mentioned in the description. I have also seen Accumulo-Fluent When experimenting w/ Typo I took the approach of building a prototype API on top of the existing Accumulo API. I found this to be an easier way to explore the concept.

          Show
          Keith Turner added a comment - I took a look at the changes. This API change builds around EntryConverter, which converts a Key+Value to an arbitrary Java object. As you mentioned this approach may not work well w/ heterogeneous data. Typo suffered from the same issue, however I think Typo was more rigid than EntryConverter. At first glance I think EntryConverter is an improvement over typo, it seems more flexible. I think some examples of using the API would make it easier to evaluate and understand it. Do you plan to address writing data? Does anyone know other APIs that abstract Accumulo's API? Typo and Gora were mentioned in the description. I have also seen Accumulo-Fluent When experimenting w/ Typo I took the approach of building a prototype API on top of the existing Accumulo API. I found this to be an easier way to explore the concept.
          Hide
          Keith Turner added a comment -

          There is a thread on the dev list related to this ticket w/ subject "Generic Supertypes/Pluggable Client"

          Show
          Keith Turner added a comment - There is a thread on the dev list related to this ticket w/ subject " Generic Supertypes/Pluggable Client "
          Hide
          Ed Kohlwey added a comment -

          Keith - yes I am planning to address writing as well. I will try to incorporate some usage examples as I have the opportunity. My goal is to eventually roll this out on the server side as well and provide similar enhancements to the iterator API.

          I think this is a little less rigid than typo - it gives the user the option (and perhaps suggests to them) that their domain model should either meaningfully interpret all byte arrays, or they shouldn't store data in the table that can't be converted to their domain model, but most importantly gives the user the option. My main goal is to address the "everyone always builds some sort of wrapper around the accumulo client api" problem by baking it into the API, and having some sane defaults available for people to use that will work with 99% of applications, not having to build a complex wrapper around the API.

          Show
          Ed Kohlwey added a comment - Keith - yes I am planning to address writing as well. I will try to incorporate some usage examples as I have the opportunity. My goal is to eventually roll this out on the server side as well and provide similar enhancements to the iterator API. I think this is a little less rigid than typo - it gives the user the option (and perhaps suggests to them) that their domain model should either meaningfully interpret all byte arrays, or they shouldn't store data in the table that can't be converted to their domain model, but most importantly gives the user the option . My main goal is to address the "everyone always builds some sort of wrapper around the accumulo client api" problem by baking it into the API, and having some sane defaults available for people to use that will work with 99% of applications, not having to build a complex wrapper around the API.
          Hide
          Christopher Tubbs added a comment -

          Ed Kohlwey: I think the branches in github have diverged too much to make sense out of. Do you have an updated version of example code/static diff for your illustration?

          Show
          Christopher Tubbs added a comment - Ed Kohlwey : I think the branches in github have diverged too much to make sense out of. Do you have an updated version of example code/static diff for your illustration?

            People

            • Assignee:
              Unassigned
              Reporter:
              Ed Kohlwey
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:

                Development