Details

    • Type: New Feature New Feature
    • Status: In Progress
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Client
    • Labels:
      None

      Description

      This proposal outlines an improvement to HBase that provides for a set of types, above and beyond the existing "byte-bucket" strategy. This is intended to reduce user-level duplication of effort, provide better support for 3rd-party integration, and provide an overall improved experience for developers using HBase.

      1. HBASE-8089-types.txt
        10 kB
        Nick Dimiduk
      2. HBASE-8089-types.txt
        9 kB
        Nick Dimiduk
      3. HBASE-8089-types.txt
        9 kB
        Nick Dimiduk
      4. HBASE-8089-types.txt
        6 kB
        Nick Dimiduk
      5. hbase data types WIP.pdf
        617 kB
        Nick Dimiduk

        Issue Links

          Activity

          Hide
          Nick Dimiduk added a comment -

          I'm beginning to think variable-length encoding for anything but char,byte arrays is an unnecessary micro-optimization. Instead of helping a user pack data via encoding, we should encourage the use of compression.

          Show
          Nick Dimiduk added a comment - I'm beginning to think variable-length encoding for anything but char,byte arrays is an unnecessary micro-optimization. Instead of helping a user pack data via encoding, we should encourage the use of compression.
          Hide
          Anoop Sam John added a comment -

          ImportTSV is a very good tool used for bulk loading. We can add a type support for this tool also. When the MR reads the file lines and convert it into bytes to store into HBase, this type can be considered. We can have a sub issue for that also?

          Show
          Anoop Sam John added a comment - ImportTSV is a very good tool used for bulk loading. We can add a type support for this tool also. When the MR reads the file lines and convert it into bytes to store into HBase, this type can be considered. We can have a sub issue for that also?
          Hide
          Nick Dimiduk added a comment -

          Updated spec document. Supported types changed a little, and a spec is outlined for the basics.

          Biggest open questions include:

          • should null values be required for each type or is it enough to support reading a null marker via the STRUCT/UNION implementation?
          • how to handle String and byte[] types. Orderly goes to great pains to encode values, Phoenix restricts the context in which they can be used.
          Show
          Nick Dimiduk added a comment - Updated spec document. Supported types changed a little, and a spec is outlined for the basics. Biggest open questions include: should null values be required for each type or is it enough to support reading a null marker via the STRUCT/UNION implementation? how to handle String and byte[] types. Orderly goes to great pains to encode values, Phoenix restricts the context in which they can be used.
          Hide
          Nick Dimiduk added a comment -

          Updated spec document with definitions for VARCHAR and CHAR. After discussion and deliberation, I decided to roughly follow Orderly's approach. The reasoning being: the additional computation imposed by incrementing values and (slight) storage overhead of explicit termination is worth the cost. That is, this approach places no limitation on where the user can use a {VAR,}CHAR type.

          Show
          Nick Dimiduk added a comment - Updated spec document with definitions for VARCHAR and CHAR. After discussion and deliberation, I decided to roughly follow Orderly's approach. The reasoning being: the additional computation imposed by incrementing values and (slight) storage overhead of explicit termination is worth the cost. That is, this approach places no limitation on where the user can use a {VAR,}CHAR type.
          Hide
          Ted Yu added a comment -

          I read the description for VARCHAR and CHAR. Looks good.

          Show
          Ted Yu added a comment - I read the description for VARCHAR and CHAR. Looks good.
          Hide
          Owen O'Malley added a comment -

          Nick,
          ORC gets a lot of mileage by doing type-specific compression. In particular, the integer columns use a vint representation (protobuf vint encoding) and run length encoding. The string columns use an adaptive dictionary (the writer switches between dictionary or direct encoding based on the 100k initial values) approach. That allows both tighter representation before turning on the relatively expensive zlib or even tighter encodings when combined with zlib.

          Show
          Owen O'Malley added a comment - Nick, ORC gets a lot of mileage by doing type-specific compression. In particular, the integer columns use a vint representation (protobuf vint encoding) and run length encoding. The string columns use an adaptive dictionary (the writer switches between dictionary or direct encoding based on the 100k initial values) approach. That allows both tighter representation before turning on the relatively expensive zlib or even tighter encodings when combined with zlib.
          Hide
          Owen O'Malley added a comment -

          You should also look at the other types from Hive:

          • Byte
          • Timestamp
          • List
          • Map
          • Union

          Hive includes a standard serialization library that produces serializations that memcmp into the natural sort order, which it uses for MapReduce key serialization.

          Show
          Owen O'Malley added a comment - You should also look at the other types from Hive: Byte Timestamp List Map Union Hive includes a standard serialization library that produces serializations that memcmp into the natural sort order, which it uses for MapReduce key serialization.
          Hide
          Nick Dimiduk added a comment -

          Hive includes a standard serialization library that produces serializations that memcmp into the natural sort order, which it uses for MapReduce key serialization.

          I didn't know about this feature in Hive, I'll check it out. Thanks for the reference, Owen O'Malley. The memcmp feature is critical for our needs; this is why most existing tools (ie, protobuf) don't work in this context. Do these Hive formats support NULLs – I'm curious how the trade-off for fixed-width types was handled. How does it handle compound keys? It looks like I have more homework to do

          Show
          Nick Dimiduk added a comment - Hive includes a standard serialization library that produces serializations that memcmp into the natural sort order, which it uses for MapReduce key serialization. I didn't know about this feature in Hive, I'll check it out. Thanks for the reference, Owen O'Malley . The memcmp feature is critical for our needs; this is why most existing tools (ie, protobuf) don't work in this context. Do these Hive formats support NULLs – I'm curious how the trade-off for fixed-width types was handled. How does it handle compound keys? It looks like I have more homework to do
          Hide
          Nick Dimiduk added a comment -

          Updates the definition of BOOLEAN type to support NULL. Updates serialized definition for {VAR,}CHAR to also invert the termination byte, thus preserving sort order. Add a working definition for STRUCT.

          Show
          Nick Dimiduk added a comment - Updates the definition of BOOLEAN type to support NULL. Updates serialized definition for { VAR,}CHAR to also invert the termination byte, thus preserving sort order. Add a working definition for STRUCT .
          Hide
          Owen O'Malley added a comment -

          Nick,
          The documentation BinarySortableSerde is here:

          http://hive.apache.org/docs/r0.10.0/api/org/apache/hadoop/hive/serde2/binarysortable/BinarySortableSerDe.html

          In Hive, it is only used in MapReduce to cut down the cost of the sort during the shuffle.

          Show
          Owen O'Malley added a comment - Nick, The documentation BinarySortableSerde is here: http://hive.apache.org/docs/r0.10.0/api/org/apache/hadoop/hive/serde2/binarysortable/BinarySortableSerDe.html In Hive, it is only used in MapReduce to cut down the cost of the sort during the shuffle.
          Hide
          Nick Dimiduk added a comment -

          I like the looks of SQLite4's encoding structure [0]. Specifically, numeric values [1], regardless of type, are directly comparable. I think it could be easily extended to support maps and lists.

          Thoughts?

          [0]: http://sqlite.org/src4/doc/trunk/www/key_encoding.wiki
          [0]: http://sqlite.org/src4/doc/trunk/www/decimal.wiki

          Show
          Nick Dimiduk added a comment - I like the looks of SQLite4's encoding structure [0] . Specifically, numeric values [1] , regardless of type, are directly comparable. I think it could be easily extended to support maps and lists. Thoughts? [0] : http://sqlite.org/src4/doc/trunk/www/key_encoding.wiki [0] : http://sqlite.org/src4/doc/trunk/www/decimal.wiki
          Hide
          Nick Dimiduk added a comment -

          The advantages I see for following SQLite4 include:

          • Serialized values are marked with their type as an initial byte. This is advantageous as serialized values can be sniffed and deserialized by tools ignorant of the application schema.
          • Numeric types (integral and real numbers) are all normalized to identical encoding. This allows them to be compared directly and provides more flexibility to users.
          • A C language implementation and tools are readily available for validation, providing test scenarios and as near-complete implementation when we're ready to work on the non-JVM client.

          The primary detriment I see with using their encoding is the limitation on disallowing null bytes in Strings. The same restriction applies to blobs except for those used as the last value in a compound key. IIRC, this restriction is identical to that imposed by Phoenix.

          Show
          Nick Dimiduk added a comment - The advantages I see for following SQLite4 include: Serialized values are marked with their type as an initial byte. This is advantageous as serialized values can be sniffed and deserialized by tools ignorant of the application schema. Numeric types (integral and real numbers) are all normalized to identical encoding. This allows them to be compared directly and provides more flexibility to users. A C language implementation and tools are readily available for validation, providing test scenarios and as near-complete implementation when we're ready to work on the non-JVM client. The primary detriment I see with using their encoding is the limitation on disallowing null bytes in Strings. The same restriction applies to blobs except for those used as the last value in a compound key. IIRC, this restriction is identical to that imposed by Phoenix.
          Hide
          Matt Corgan added a comment -

          It sounds well thought out. Are you thinking we'd be able to add custom types on top of their base types?

          The primary detriment I see with using their encoding is the limitation on disallowing null bytes in Strings

          Could we allow nulls in values? I personally don't mind disallowing them in row keys.

          Show
          Matt Corgan added a comment - It sounds well thought out. Are you thinking we'd be able to add custom types on top of their base types? The primary detriment I see with using their encoding is the limitation on disallowing null bytes in Strings Could we allow nulls in values? I personally don't mind disallowing them in row keys.
          Hide
          Nick Dimiduk added a comment -

          I don't see why not. We could also cherry-pick the null-safe String and Blob implementations from Orderly if it's a critical feature.

          Show
          Nick Dimiduk added a comment - I don't see why not. We could also cherry-pick the null-safe String and Blob implementations from Orderly if it's a critical feature.
          Hide
          Nick Dimiduk added a comment -

          I'm leaving a comment here since there's way more watchers on this parent ticket. Check out the patch on HBASE-8201 for an implementation of serialization primitives based on the SQLite4 spec.

          Show
          Nick Dimiduk added a comment - I'm leaving a comment here since there's way more watchers on this parent ticket. Check out the patch on HBASE-8201 for an implementation of serialization primitives based on the SQLite4 spec.
          Hide
          Nick Dimiduk added a comment -

          Attaching my slides from the Hadoop Summit BoF talk per stack's suggestion.

          Show
          Nick Dimiduk added a comment - Attaching my slides from the Hadoop Summit BoF talk per stack 's suggestion.
          Hide
          Matt Corgan added a comment -

          Nick - what other dependencies does your whole new type library have on HBase besides ByteRange? It would be grand if it were a standalone jar that could be used by other projects without importing hbase-specific libs (which then drag in other dependencies). All of this functionality is really cool and is more likely to gain adoption if it's as easy as possible to drop in existing projects.

          Show
          Matt Corgan added a comment - Nick - what other dependencies does your whole new type library have on HBase besides ByteRange? It would be grand if it were a standalone jar that could be used by other projects without importing hbase-specific libs (which then drag in other dependencies). All of this functionality is really cool and is more likely to gain adoption if it's as easy as possible to drop in existing projects.
          Hide
          Nick Dimiduk added a comment -

          Matt - the only other thing is a dependency on HBase's Bytes in a couple places. I think this can easily be removed. Part of the point of this effort is to have HBase ship a standard implementation that other Hadoop ecosystem components can rely on. If I just wanted something for my own application, I'd use Orderly and be done with it.

          Show
          Nick Dimiduk added a comment - Matt - the only other thing is a dependency on HBase's Bytes in a couple places. I think this can easily be removed. Part of the point of this effort is to have HBase ship a standard implementation that other Hadoop ecosystem components can rely on. If I just wanted something for my own application, I'd use Orderly and be done with it.
          Hide
          Eli Collins added a comment -

          Hey Nick,

          It might be worth updating this jira to reflect the latest state of the work. IIUC this work is about proving a client-side library that does the order-preserving serialization, that higher level projects (eg Phoenix & Kiji) can use for row keys and column qualifiers. Per the other jiras, cell serialization, defining types, and schema are out of scope. These are left to higher-level systems which may make different choices (eg in terms of how to create compound keys) and may have different type models, but at least will be able to share serialization.

          IMO it's worth considering creating a separate project for this as this is genuinely useful outside HBase (eg container formats) and would benefit from multiple language implementations (the serialization here is language agnostic right?) and so the HBase project may end up being a clunky place to maintain things.

          Thanks,
          Eli

          Show
          Eli Collins added a comment - Hey Nick, It might be worth updating this jira to reflect the latest state of the work. IIUC this work is about proving a client-side library that does the order-preserving serialization, that higher level projects (eg Phoenix & Kiji) can use for row keys and column qualifiers. Per the other jiras, cell serialization, defining types, and schema are out of scope. These are left to higher-level systems which may make different choices (eg in terms of how to create compound keys) and may have different type models, but at least will be able to share serialization. IMO it's worth considering creating a separate project for this as this is genuinely useful outside HBase (eg container formats) and would benefit from multiple language implementations (the serialization here is language agnostic right?) and so the HBase project may end up being a clunky place to maintain things. Thanks, Eli
          Hide
          Nick Dimiduk added a comment -

          Hi Eli,

          You're right, I've left this ticket untouched while working through the initial subtasks.

          The order-preserving serialization is a critical component of the work. I think this is a feature that HBase absolutely must provide if there's to be any hope for interoperability. I also think the serialization format is necessary but not sufficient. An HBase that ships with an API for describing data types and implementations of a set of common definitions takes the next step in interoperability. By defining the type interface, type implementations provided by 3rd parties become pluggable – it becomes feasible for a user to plug a type from Phoenix into their Kiji application. Systems like Phoenix, Kiji, and HCatalog are all choices for defining and managing schema. It may be the case that HBase should define the schema interfaces as well, but that's definitely beyond the scope here. But if those tools are going to interoperate, they need a common language of types with which to do so. Serialization, IMHO, is insufficient.

          I don't know if there's a new project to be built out of this work. I see no need to create such a thing when the needs and use are not yet proven. The introduction of types in HBase will shake things up enough as it is, let's see how people and projects use them before promoting this stuff to its own project.

          Yes, the serialization formats defined in HBASE-8201 are designed to be language agnostic. It's highly likely that I've missed some critical details here or there in the specification. Time will tell

          -n

          Show
          Nick Dimiduk added a comment - Hi Eli, You're right, I've left this ticket untouched while working through the initial subtasks. The order-preserving serialization is a critical component of the work. I think this is a feature that HBase absolutely must provide if there's to be any hope for interoperability. I also think the serialization format is necessary but not sufficient. An HBase that ships with an API for describing data types and implementations of a set of common definitions takes the next step in interoperability. By defining the type interface, type implementations provided by 3rd parties become pluggable – it becomes feasible for a user to plug a type from Phoenix into their Kiji application. Systems like Phoenix, Kiji, and HCatalog are all choices for defining and managing schema. It may be the case that HBase should define the schema interfaces as well, but that's definitely beyond the scope here. But if those tools are going to interoperate, they need a common language of types with which to do so. Serialization, IMHO, is insufficient. I don't know if there's a new project to be built out of this work. I see no need to create such a thing when the needs and use are not yet proven. The introduction of types in HBase will shake things up enough as it is, let's see how people and projects use them before promoting this stuff to its own project. Yes, the serialization formats defined in HBASE-8201 are designed to be language agnostic. It's highly likely that I've missed some critical details here or there in the specification. Time will tell -n
          Hide
          stack added a comment -

          An HBase that ships with an API for describing data types and implementations of a set of common definitions takes the next step in interoperability.

          So you are thinking more than just a client-side utility lib but an actual facade that does typing (though it is all client-side) as in for example TypedHTable that does something like typedHTable.put(new Put(row).addInteger(12345)) and int i = typedHTable.get(new Get(row).getInteger());? HBase internally is still all byte arrays but it'd have this new class that made it look like we could exploit this typing info server-side (e.g. better compression)? I suppose I'd have to register a serializer w/ this new TypedHTable too? Would the serializer be per table?

          Show
          stack added a comment - An HBase that ships with an API for describing data types and implementations of a set of common definitions takes the next step in interoperability. So you are thinking more than just a client-side utility lib but an actual facade that does typing (though it is all client-side) as in for example TypedHTable that does something like typedHTable.put(new Put(row).addInteger(12345)) and int i = typedHTable.get(new Get(row).getInteger());? HBase internally is still all byte arrays but it'd have this new class that made it look like we could exploit this typing info server-side (e.g. better compression)? I suppose I'd have to register a serializer w/ this new TypedHTable too? Would the serializer be per table?
          Hide
          Nick Dimiduk added a comment -

          stack This is a possible direction which I have not thoroughly explored. Some discussion around client-side ease-of-use has started on HBASE-7941. I've had a couple hallway conversations about bringing type awareness into the RegionServer, but none of it concrete.

          Show
          Nick Dimiduk added a comment - stack This is a possible direction which I have not thoroughly explored. Some discussion around client-side ease-of-use has started on HBASE-7941 . I've had a couple hallway conversations about bringing type awareness into the RegionServer, but none of it concrete.
          Hide
          Eli Collins added a comment -

          Thanks Nick. Probably worth a broader discussion. The view in HBASE-7941 that HBase is a database and should therefore provide types is pretty different from Bigtable's design:

          "In many ways, Bigtable resembles a database: it shares many implementation strategies with databases... but Bigtable provides a different interface than such systems. Bigtable does not support a full relational data model; instead, it provides clients with a simple data model that supports dynamic control over data layout and format, and allows clients to reason about the locality properties of the data represented in the underlying storage... Bigtable also treats data as uninterpreted strings, although clients often serialize various forms of structured and semi-structured data into these strings. Clients can control the locality of their data through careful choices in their schemas."

          While you could preserve the flexibility here while providing one implementation of a type model in HBase I think it's an explicit, existing design decision to have HBase support multiple distinct type models in higher level systems. And if those systems want to share code and type models that's great, but IMO HBase is a storage system w/o an explicit type model by design, and we start to lose the above benefits as we bring type awareness into core HBase components like the RS.

          Show
          Eli Collins added a comment - Thanks Nick. Probably worth a broader discussion. The view in HBASE-7941 that HBase is a database and should therefore provide types is pretty different from Bigtable's design: "In many ways, Bigtable resembles a database: it shares many implementation strategies with databases... but Bigtable provides a different interface than such systems. Bigtable does not support a full relational data model; instead, it provides clients with a simple data model that supports dynamic control over data layout and format, and allows clients to reason about the locality properties of the data represented in the underlying storage... Bigtable also treats data as uninterpreted strings, although clients often serialize various forms of structured and semi-structured data into these strings. Clients can control the locality of their data through careful choices in their schemas." While you could preserve the flexibility here while providing one implementation of a type model in HBase I think it's an explicit, existing design decision to have HBase support multiple distinct type models in higher level systems. And if those systems want to share code and type models that's great, but IMO HBase is a storage system w/o an explicit type model by design, and we start to lose the above benefits as we bring type awareness into core HBase components like the RS.
          Hide
          Andrew Purtell added a comment -

          What is currently actionable for 0.98, a timeframe of a few weeks... ?

          Show
          Andrew Purtell added a comment - What is currently actionable for 0.98, a timeframe of a few weeks... ?
          Hide
          Nick Dimiduk added a comment -

          Of these subtasks, probably performance improvements (HBASE-8694) and type comparisons (HBASE-8863) could be tackled by a willing individual. Client-side API enhancements will take some time for discussion. I think the ImportTSV stuff should be tackled after we've defined a language for type declaration (similar to what we have for Filters in ParseFilter).

          Show
          Nick Dimiduk added a comment - Of these subtasks, probably performance improvements ( HBASE-8694 ) and type comparisons ( HBASE-8863 ) could be tackled by a willing individual. Client-side API enhancements will take some time for discussion. I think the ImportTSV stuff should be tackled after we've defined a language for type declaration (similar to what we have for Filters in ParseFilter ).
          Hide
          Andrew Purtell added a comment -

          Unscheduling from 0.98

          Show
          Andrew Purtell added a comment - Unscheduling from 0.98

            People

            • Assignee:
              Nick Dimiduk
              Reporter:
              Nick Dimiduk
            • Votes:
              0 Vote for this issue
              Watchers:
              34 Start watching this issue

              Dates

              • Created:
                Updated:

                Development