Hive
  1. Hive
  2. HIVE-5523

HiveHBaseStorageHandler should pass kerbros credentials down to HBase

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.11.0
    • Fix Version/s: None
    • Component/s: HBase Handler
    • Labels:
      None

      Description

      Running on a secured cluster, I have an HBase table defined thusly

      CREATE TABLE IF NOT EXISTS pagecounts_hbase (rowkey STRING, pageviews STRING, bytes STRING)
      STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
      WITH SERDEPROPERTIES ('hbase.columns.mapping' = ':key,f:c1,f:c2')
      TBLPROPERTIES ('hbase.table.name' = 'pagecounts');
      

      and a query to populate that table

      -- ensure hbase dependency jars are shipped with the MR job
      SET hive.aux.jars.path = file:///etc/hbase/conf/hbase-site.xml,file:///usr/lib/hive/lib/hive-hbase-handler-0.11.0.1.3.2.0-111.jar,file:///usr/lib/hbase/hbase-0.94.6.1.3.2.0-111-security.jar,file:///usr/lib/zookeeper/zookeeper-3.4.5.1.3.2.0-111.jar;
      
      -- populate our hbase table
      FROM pgc INSERT INTO TABLE pagecounts_hbase SELECT pgc.* WHERE rowkey LIKE 'en/q%' LIMIT 10;
      

      The reduce tasks fail with what boils down to the following exception:

      Caused by: java.lang.RuntimeException: SASL authentication failed. The most likely cause is missing or invalid credentials. Consider 'kinit'.
      	at org.apache.hadoop.hbase.ipc.SecureClient$SecureConnection$1.run(SecureClient.java:263)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:396)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      	at java.lang.reflect.Method.invoke(Method.java:597)
      	at org.apache.hadoop.hbase.util.Methods.call(Methods.java:37)
      	at org.apache.hadoop.hbase.security.User.call(User.java:590)
      	at org.apache.hadoop.hbase.security.User.access$700(User.java:51)
      	at org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:444)
      	at org.apache.hadoop.hbase.ipc.SecureClient$SecureConnection.handleSaslConnectionFailure(SecureClient.java:224)
      	at org.apache.hadoop.hbase.ipc.SecureClient$SecureConnection.setupIOstreams(SecureClient.java:313)
      	at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1124)
      	at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:974)
      	at org.apache.hadoop.hbase.ipc.SecureRpcEngine$Invoker.invoke(SecureRpcEngine.java:104)
      	at $Proxy10.getProtocolVersion(Unknown Source)
      	at org.apache.hadoop.hbase.ipc.SecureRpcEngine.getProxy(SecureRpcEngine.java:146)
      	at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208)
      	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1346)
      	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1305)
      	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1292)
      	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1001)
      	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:896)
      	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:998)
      	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:900)
      	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:857)
      	at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:234)
      	at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:174)
      	at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:133)
      	at org.apache.hadoop.hive.hbase.HiveHBaseTableOutputFormat.getHiveRecordWriter(HiveHBaseTableOutputFormat.java:83)
      	at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getRecordWriter(HiveFileFormatUtils.java:250)
      	at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:237)
      	... 17 more
      
      1. Task Logs_ 'attempt_201310110032_0023_r_000000_0'.html
        65 kB
        Nick Dimiduk
      2. HIVE-5523.patch
        3 kB
        Sushanth Sowmyan

        Activity

        Hide
        Sushanth Sowmyan added a comment -

        I'm canceling this patch because I feel there is still more code cleanup requried here to make things more obvious, and code cleanup is now the point of this patch, since the original reported issue is working without this patch. HIVE-6915 fixes adding delegation token for Tez, and in doing so, opens up questions on whether how it's done currently is working because it happens-to-work, or because that's the way it's supposed to, and I'm leaning towards the former.

        Show
        Sushanth Sowmyan added a comment - I'm canceling this patch because I feel there is still more code cleanup requried here to make things more obvious, and code cleanup is now the point of this patch, since the original reported issue is working without this patch. HIVE-6915 fixes adding delegation token for Tez, and in doing so, opens up questions on whether how it's done currently is working because it happens-to-work, or because that's the way it's supposed to, and I'm leaning towards the former.
        Hide
        Ashutosh Chauhan added a comment -

        +1

        Show
        Ashutosh Chauhan added a comment - +1
        Hide
        Sushanth Sowmyan added a comment -

        Thanks!

        The reason I flipped the input/output logic was because hive executes the input part "always". HCat will execute input if input and output if output. For HCat, thus, it makes no difference whether we check for input or output. For hive, it has no concept of input and output, and thus executes a "default", which is the same as input, so it felt like it was a bit cleaner to treat output as a special case which defaulted to false, rather than to treat input as a selector that defaulted to true. I'm still not completely happy about it, to be honest.

        Show
        Sushanth Sowmyan added a comment - Thanks! The reason I flipped the input/output logic was because hive executes the input part "always". HCat will execute input if input and output if output. For HCat, thus, it makes no difference whether we check for input or output. For hive, it has no concept of input and output, and thus executes a "default", which is the same as input, so it felt like it was a bit cleaner to treat output as a special case which defaulted to false, rather than to treat input as a selector that defaulted to true. I'm still not completely happy about it, to be honest.
        Hide
        Nick Dimiduk added a comment -

        This works, +1. Thanks for digging in.

        Show
        Nick Dimiduk added a comment - This works, +1. Thanks for digging in.
        Hide
        Nick Dimiduk added a comment -

        I'm not sure I understand the enhancement of flipping the input/output logic, but the comment helps clarify things a bit. Will take this for a spin in a few.

        Show
        Nick Dimiduk added a comment - I'm not sure I understand the enhancement of flipping the input/output logic, but the comment helps clarify things a bit. Will take this for a spin in a few.
        Hide
        Sushanth Sowmyan added a comment -

        Per my testing, this is actually working currently. From code-reading, I was initially convinced there was a bug, but it does seem to properly initialize hbase delegation tokens for read and write jobs, from hive and hcatalog, but it currently works for me.

        I ran the following:
        To create the table:

        CREATE TABLE sushhbdata(rowkey STRING, pageviews STRING, bytes STRING);
        
        LOAD DATA LOCAL INPATH 'sushhbdata' INTO TABLE sushhbdata;
        
        CREATE TABLE IF NOT EXISTS sushpagecounts_hbase (rowkey STRING, pageviews STRING, bytes STRING)
        STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
        WITH SERDEPROPERTIES ('hbase.columns.mapping' = ':key,f:c1,f:c2')
        TBLPROPERTIES ('hbase.table.name' = 'sushpagecounts');
        

        To insert data into the table:

        set hive.execution.engine=mr;
        
        FROM sushhbdata INSERT INTO TABLE sushpagecounts_hbase SELECT * where rowkey like 'EN%';
        

        For the actual data, I just randomly generated strings of varying length in all caps, there were quite a few that began with "EN".

        (Note that I set the execution engine to mr because there is another unrelated tez issue that blocks it currently if you don't set it, and I think Gunther or Sergei are trying to fix that. And I don't need aux jars either for our current hive commandline.)

        I have a patch I'd suggest up for this anyway to make life a little easier for those who read this section of code in the future.

        Could you please verify?

        Show
        Sushanth Sowmyan added a comment - Per my testing, this is actually working currently. From code-reading, I was initially convinced there was a bug, but it does seem to properly initialize hbase delegation tokens for read and write jobs, from hive and hcatalog, but it currently works for me. I ran the following: To create the table: CREATE TABLE sushhbdata(rowkey STRING, pageviews STRING, bytes STRING); LOAD DATA LOCAL INPATH 'sushhbdata' INTO TABLE sushhbdata; CREATE TABLE IF NOT EXISTS sushpagecounts_hbase (rowkey STRING, pageviews STRING, bytes STRING) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ('hbase.columns.mapping' = ':key,f:c1,f:c2') TBLPROPERTIES ('hbase.table.name' = 'sushpagecounts'); To insert data into the table: set hive.execution.engine=mr; FROM sushhbdata INSERT INTO TABLE sushpagecounts_hbase SELECT * where rowkey like 'EN%'; For the actual data, I just randomly generated strings of varying length in all caps, there were quite a few that began with "EN". (Note that I set the execution engine to mr because there is another unrelated tez issue that blocks it currently if you don't set it, and I think Gunther or Sergei are trying to fix that. And I don't need aux jars either for our current hive commandline.) I have a patch I'd suggest up for this anyway to make life a little easier for those who read this section of code in the future. Could you please verify?
        Hide
        Nick Dimiduk added a comment -

        Attaching task log.

        Show
        Nick Dimiduk added a comment - Attaching task log.

          People

          • Assignee:
            Sushanth Sowmyan
            Reporter:
            Nick Dimiduk
          • Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:

              Development