Details

    • Type: Sub-task Sub-task
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Client
    • Labels:
      None

      Description

      Access to the DataType implementations introduced in HBASE-8693 is currently limited to consumers of the Java API. It is not easy to specify a data type in non-Java environments, such as the HBase shell, REST or Thrift Gateways, command-line arguments to our utility MapReduce jobs, or in integration points such as a (hypothetical extension to) Hive's HBaseStorageHandler. See examples where this limitation impedes in HBASE-8593 and HBASE-10071.

      I propose the implementation of a type definition DSL, similar to the language defined for Filters in HBASE-4176. By implementing this in core HBase, it can be reused in all of the situations described previously. The parser for this DSL must support arbitrary type extensions, just as the Filter parser allows for new Filter types to be registered at runtime.

        Issue Links

          Activity

          Hide
          Nick Dimiduk added a comment -

          Such a parser should probably allow registering aliases for a more complete type definition in order to make interactive experiences (such as HBASE-10071) more palatable. Once could establish an alias for the session, say long => OrderedInt64#DESCENDING, decimal => OrderedNumeric, and my_struct => some composite Struct declaration.

          Show
          Nick Dimiduk added a comment - Such a parser should probably allow registering aliases for a more complete type definition in order to make interactive experiences (such as HBASE-10071 ) more palatable. Once could establish an alias for the session, say long => OrderedInt64#DESCENDING , decimal => OrderedNumeric , and my_struct => some composite Struct declaration .
          Hide
          stack added a comment -

          Tell me more about how this would work?

          hbase:int would map to org.apache.hadoop.hbase.types.RawInt in the DSL

          Each language would have to have an interpreter for the DSL?

          There is some overlap with how types are called out in avro/pb IDLs?

          Show
          stack added a comment - Tell me more about how this would work? hbase:int would map to org.apache.hadoop.hbase.types.RawInt in the DSL Each language would have to have an interpreter for the DSL? There is some overlap with how types are called out in avro/pb IDLs?
          Hide
          Nick Dimiduk added a comment -

          I haven't worked through a prototype yet, so I don't know exactly. The DSL we have for exposing filters is parsed once, in Java (using ParseFilter), by the shell or Thrift service (I guess REST service doesn't support this yet). The user would provide the type mapping as a configuration string and let whatever is interacting with the HTable handle sending provided data literals to the correct DataType instances.

          One example consumer is the Hive metastore. The table is defined in metastore that has a column mapping, similar to today, mapping the metastore table column to an HBase table column. In addition to the column mapping, a type specification is also provided. This would be an Expression in the DSL we're discussing. The StorageHandler would be responsible for honoring this additional component in the mapping. How exactly we ensure the metastore type can be converted to/from the HBase DataType is still up for question. I hope to learn from Phoenix on this, hence I deferred that work out to HBASE-8863.

          More concretely, I imagine this DSL is relatively simple. A complete type definition might be as simple as package.class[/ORDER]. We'll need to add any necessary API to DataType to support constructing from the parser. There may also be some built-in named definitions, "raw" or "ordered-bytes", where we ship an existing known mapping between Java type and HBase DataType implementation. This would be a convenience for consumers of HTable; I don't know how this would play into a metastore implementation.

          The only place where potential overlap with Avro/Protobuf comes in is with Struct. I'm not convinced this is very complicated either; just a sequence of types with syntax for specifying an optional element. There's no concept of "schema versioning" in Struct; there's no room for it in a place where encoded ordering is the primary concern.

          Show
          Nick Dimiduk added a comment - I haven't worked through a prototype yet, so I don't know exactly. The DSL we have for exposing filters is parsed once, in Java (using ParseFilter ), by the shell or Thrift service (I guess REST service doesn't support this yet). The user would provide the type mapping as a configuration string and let whatever is interacting with the HTable handle sending provided data literals to the correct DataType instances. One example consumer is the Hive metastore. The table is defined in metastore that has a column mapping, similar to today, mapping the metastore table column to an HBase table column. In addition to the column mapping, a type specification is also provided. This would be an Expression in the DSL we're discussing. The StorageHandler would be responsible for honoring this additional component in the mapping. How exactly we ensure the metastore type can be converted to/from the HBase DataType is still up for question. I hope to learn from Phoenix on this, hence I deferred that work out to HBASE-8863 . More concretely, I imagine this DSL is relatively simple. A complete type definition might be as simple as package.class[/ORDER] . We'll need to add any necessary API to DataType to support constructing from the parser. There may also be some built-in named definitions, "raw" or "ordered-bytes", where we ship an existing known mapping between Java type and HBase DataType implementation. This would be a convenience for consumers of HTable; I don't know how this would play into a metastore implementation. The only place where potential overlap with Avro/Protobuf comes in is with Struct . I'm not convinced this is very complicated either; just a sequence of types with syntax for specifying an optional element. There's no concept of "schema versioning" in Struct ; there's no room for it in a place where encoded ordering is the primary concern.
          Hide
          Nick Dimiduk added a comment -

          Looks like Navis has been thinking about how to specify a composite as well.

          Show
          Nick Dimiduk added a comment - Looks like Navis has been thinking about how to specify a composite as well.

            People

            • Assignee:
              Unassigned
              Reporter:
              Nick Dimiduk
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:

                Development