Hive
  1. Hive
  2. HIVE-1245

allow access to values stored as non-strings in HBase

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.6.0
    • Fix Version/s: None
    • Component/s: HBase Handler
    • Labels:
      None

      Issue Links

        Activity

        John Sichi created issue -
        John Sichi made changes -
        Field Original Value New Value
        Link This issue relates to HIVE-705 [ HIVE-705 ]
        Show
        John Sichi added a comment - Oops, here's the permalink: http://mail-archives.apache.org/mod_mbox/hadoop-hive-user/201003.mbox/%3C9A53DDE1FE082F4D952FDF20AC87E21F021F3C1A@exchange2.t8design.com%3E
        Hide
        John Sichi added a comment -

        As part of this, should also test and document the existing support for serializing arrays and structs, and also expose the control over when to use JSON vs delimited for these and maps.

        hive> CREATE TABLE complex(
        key string,
        a array<string>,
        s struct<col1 : int, col2 : int>)
        STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
        WITH SERDEPROPERTIES (
        "hbase.columns.mapping" = "cf:a, cf:s"
        );
        OK
        hive>
        INSERT OVERWRITE TABLE complex
        SELECT bar, array('x', 'y', 'z'), struct(100, 200)
        FROM pokes
        WHERE foo=497;
        ...
        OK
        hive>
        SELECT * FROM complex;
        OK
        val_497 ["x","y","z"]

        {"col1":100,"col2":200}

        hbase(main):003:0> scan 'complex'
        ROW COLUMN+CELL
        val_497 column= cf:s, timestamp=1275419258650, value=100\x02200
        val_497 column=cf:a, timestamp=1275419258650, value=x\x02y\x02z
        1 row(s) in 1.0250 seconds

        Show
        John Sichi added a comment - As part of this, should also test and document the existing support for serializing arrays and structs, and also expose the control over when to use JSON vs delimited for these and maps. hive> CREATE TABLE complex( key string, a array<string>, s struct<col1 : int, col2 : int>) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( "hbase.columns.mapping" = "cf:a, cf:s" ); OK hive> INSERT OVERWRITE TABLE complex SELECT bar, array('x', 'y', 'z'), struct(100, 200) FROM pokes WHERE foo=497; ... OK hive> SELECT * FROM complex; OK val_497 ["x","y","z"] {"col1":100,"col2":200} hbase(main):003:0> scan 'complex' ROW COLUMN+CELL val_497 column= cf:s, timestamp=1275419258650, value=100\x02200 val_497 column=cf:a, timestamp=1275419258650, value=x\x02y\x02z 1 row(s) in 1.0250 seconds
        John Sichi made changes -
        Fix Version/s 0.6.0 [ 12314524 ]
        Hide
        John Sichi added a comment -

        For atomic types, we could extend the column-level mapping directive to allow for three options

        • string
        • binary
        • use table-level default

        So where we currently have a:b, we would support a:b:string and a:b:binary.

        The table-level default would be set in a separate serde property hbase.storedtype.atomic, with a default value of string for backwards-compatibility.

        Then something similar for compound types, but with json and delimited as options? I haven't thought about all the combinations, and what to do with column familiies.

        Show
        John Sichi added a comment - For atomic types, we could extend the column-level mapping directive to allow for three options string binary use table-level default So where we currently have a:b, we would support a:b:string and a:b:binary. The table-level default would be set in a separate serde property hbase.storedtype.atomic, with a default value of string for backwards-compatibility. Then something similar for compound types, but with json and delimited as options? I haven't thought about all the combinations, and what to do with column familiies.
        Basab Maulik made changes -
        Assignee John Sichi [ jvs ] Basab Maulik [ bkm ]
        Basab Maulik made changes -
        Link This issue is blocked by HIVE-1634 [ HIVE-1634 ]
        Hide
        Basab Maulik added a comment -

        HIVE-1634 addresses the design for the case of primitive types.

        The implementation is very similar to the proposed design above. It uses the serde property:

        "hbase.columns.storage.types" = ",b,b,b,b,b,s,:s,b:s,b,b"

        for specifying column level storage options, where '-' stands for the table default, 'b' for binary, and 's' for string/UTF8 storage.

        The table property

        "hbase.table.default.storage.type" = "binary"

        can be used to specify a table level default.

        Show
        Basab Maulik added a comment - HIVE-1634 addresses the design for the case of primitive types. The implementation is very similar to the proposed design above. It uses the serde property: "hbase.columns.storage.types" = " ,b,b,b,b,b,s, :s,b:s,b,b" for specifying column level storage options, where '-' stands for the table default, 'b' for binary, and 's' for string/UTF8 storage. The table property "hbase.table.default.storage.type" = "binary" can be used to specify a table level default.
        Hide
        zengchuan added a comment -

        I'm new to hbase and hive. I create a table in hbase and add data array into it.

        public static void createTable(String tablename) throws IOException{
        HBaseAdmin admin = new HBaseAdmin(hbaseConfig);
        if(admin.tableExists(tablename))

        { System.out.println("table Exists!!!"); }

        else

        { HTableDescriptor tableDesc = new HTableDescriptor(tablename); tableDesc.addFamily(new HColumnDescriptor("dom")); admin.createTable(tableDesc); }


        }

        public static void addData(String tablename) throws IOException

        { HTable table=new HTable(hbaseConfig,tablename); Put put = new Put(Bytes.toBytes(String.valueOf(i))); List<String> a = new ArrayList<String>(); a.add("domain1"); a.add("domain2"); Object obj = doType(hbaseConfig, a, List.class); Writable w = new HbaseObjectWritable(obj); byte[] depthMapByteArray = WritableUtils.toByteArray(w); put.add(Bytes.toBytes("dom"), Bytes.toBytes("domain"), depthMapByteArray ); table.put(put); }

        private static Object doType(Configuration conf, Object value,
        Class<?> clazz)
        throws IOException

        { ByteArrayOutputStream byteStream = new ByteArrayOutputStream(); DataOutputStream out = new DataOutputStream(byteStream); HbaseObjectWritable.writeObject(out, value, clazz, conf); out.close(); ByteArrayInputStream bais = new ByteArrayInputStream(byteStream.toByteArray()); DataInputStream dis = new DataInputStream(bais); Object product = HbaseObjectWritable.readObject(dis, conf); dis.close(); return product; }

        in hive i create a table

        CREATE EXTERNAL TABLE hbase_table_2(row_key int, domain Array<String>)
        STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
        WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,dom:domain")
        TBLPROPERTIES("hbase.table.name" = "table2");

        in hbase_table_2 domain Array<String> is not right. why?

        Show
        zengchuan added a comment - I'm new to hbase and hive. I create a table in hbase and add data array into it. public static void createTable(String tablename) throws IOException{ HBaseAdmin admin = new HBaseAdmin(hbaseConfig); if(admin.tableExists(tablename)) { System.out.println("table Exists!!!"); } else { HTableDescriptor tableDesc = new HTableDescriptor(tablename); tableDesc.addFamily(new HColumnDescriptor("dom")); admin.createTable(tableDesc); } } public static void addData(String tablename) throws IOException { HTable table=new HTable(hbaseConfig,tablename); Put put = new Put(Bytes.toBytes(String.valueOf(i))); List<String> a = new ArrayList<String>(); a.add("domain1"); a.add("domain2"); Object obj = doType(hbaseConfig, a, List.class); Writable w = new HbaseObjectWritable(obj); byte[] depthMapByteArray = WritableUtils.toByteArray(w); put.add(Bytes.toBytes("dom"), Bytes.toBytes("domain"), depthMapByteArray ); table.put(put); } private static Object doType(Configuration conf, Object value, Class<?> clazz) throws IOException { ByteArrayOutputStream byteStream = new ByteArrayOutputStream(); DataOutputStream out = new DataOutputStream(byteStream); HbaseObjectWritable.writeObject(out, value, clazz, conf); out.close(); ByteArrayInputStream bais = new ByteArrayInputStream(byteStream.toByteArray()); DataInputStream dis = new DataInputStream(bais); Object product = HbaseObjectWritable.readObject(dis, conf); dis.close(); return product; } in hive i create a table CREATE EXTERNAL TABLE hbase_table_2(row_key int, domain Array<String>) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,dom:domain") TBLPROPERTIES("hbase.table.name" = "table2"); in hbase_table_2 domain Array<String> is not right. why?

          People

          • Assignee:
            Basab Maulik
            Reporter:
            John Sichi
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:

              Development