Hive
  1. Hive
  2. HIVE-5523

HiveHBaseStorageHandler should pass kerbros credentials down to HBase

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Not A Problem
    • Affects Version/s: 0.11.0
    • Fix Version/s: None
    • Component/s: HBase Handler
    • Labels:
      None

      Description

      Running on a secured cluster, I have an HBase table defined thusly

      CREATE TABLE IF NOT EXISTS pagecounts_hbase (rowkey STRING, pageviews STRING, bytes STRING)
      STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
      WITH SERDEPROPERTIES ('hbase.columns.mapping' = ':key,f:c1,f:c2')
      TBLPROPERTIES ('hbase.table.name' = 'pagecounts');
      

      and a query to populate that table

      -- ensure hbase dependency jars are shipped with the MR job
      SET hive.aux.jars.path = file:///etc/hbase/conf/hbase-site.xml,file:///usr/lib/hive/lib/hive-hbase-handler-0.11.0.1.3.2.0-111.jar,file:///usr/lib/hbase/hbase-0.94.6.1.3.2.0-111-security.jar,file:///usr/lib/zookeeper/zookeeper-3.4.5.1.3.2.0-111.jar;
      
      -- populate our hbase table
      FROM pgc INSERT INTO TABLE pagecounts_hbase SELECT pgc.* WHERE rowkey LIKE 'en/q%' LIMIT 10;
      

      The reduce tasks fail with what boils down to the following exception:

      Caused by: java.lang.RuntimeException: SASL authentication failed. The most likely cause is missing or invalid credentials. Consider 'kinit'.
      	at org.apache.hadoop.hbase.ipc.SecureClient$SecureConnection$1.run(SecureClient.java:263)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:396)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      	at java.lang.reflect.Method.invoke(Method.java:597)
      	at org.apache.hadoop.hbase.util.Methods.call(Methods.java:37)
      	at org.apache.hadoop.hbase.security.User.call(User.java:590)
      	at org.apache.hadoop.hbase.security.User.access$700(User.java:51)
      	at org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:444)
      	at org.apache.hadoop.hbase.ipc.SecureClient$SecureConnection.handleSaslConnectionFailure(SecureClient.java:224)
      	at org.apache.hadoop.hbase.ipc.SecureClient$SecureConnection.setupIOstreams(SecureClient.java:313)
      	at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1124)
      	at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:974)
      	at org.apache.hadoop.hbase.ipc.SecureRpcEngine$Invoker.invoke(SecureRpcEngine.java:104)
      	at $Proxy10.getProtocolVersion(Unknown Source)
      	at org.apache.hadoop.hbase.ipc.SecureRpcEngine.getProxy(SecureRpcEngine.java:146)
      	at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208)
      	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1346)
      	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1305)
      	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1292)
      	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1001)
      	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:896)
      	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:998)
      	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:900)
      	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:857)
      	at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:234)
      	at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:174)
      	at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:133)
      	at org.apache.hadoop.hive.hbase.HiveHBaseTableOutputFormat.getHiveRecordWriter(HiveHBaseTableOutputFormat.java:83)
      	at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getRecordWriter(HiveFileFormatUtils.java:250)
      	at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:237)
      	... 17 more
      
      1. Task Logs_ 'attempt_201310110032_0023_r_000000_0'.html
        65 kB
        Nick Dimiduk
      2. HIVE-5523.patch
        3 kB
        Sushanth Sowmyan

        Activity

        Nick Dimiduk created issue -
        Hide
        Nick Dimiduk added a comment -

        Attaching task log.

        Show
        Nick Dimiduk added a comment - Attaching task log.
        Nick Dimiduk made changes -
        Field Original Value New Value
        Attachment Task Logs_ 'attempt_201310110032_0023_r_000000_0'.html [ 12608087 ]
        Hide
        Sushanth Sowmyan added a comment -

        Per my testing, this is actually working currently. From code-reading, I was initially convinced there was a bug, but it does seem to properly initialize hbase delegation tokens for read and write jobs, from hive and hcatalog, but it currently works for me.

        I ran the following:
        To create the table:

        CREATE TABLE sushhbdata(rowkey STRING, pageviews STRING, bytes STRING);
        
        LOAD DATA LOCAL INPATH 'sushhbdata' INTO TABLE sushhbdata;
        
        CREATE TABLE IF NOT EXISTS sushpagecounts_hbase (rowkey STRING, pageviews STRING, bytes STRING)
        STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
        WITH SERDEPROPERTIES ('hbase.columns.mapping' = ':key,f:c1,f:c2')
        TBLPROPERTIES ('hbase.table.name' = 'sushpagecounts');
        

        To insert data into the table:

        set hive.execution.engine=mr;
        
        FROM sushhbdata INSERT INTO TABLE sushpagecounts_hbase SELECT * where rowkey like 'EN%';
        

        For the actual data, I just randomly generated strings of varying length in all caps, there were quite a few that began with "EN".

        (Note that I set the execution engine to mr because there is another unrelated tez issue that blocks it currently if you don't set it, and I think Gunther or Sergei are trying to fix that. And I don't need aux jars either for our current hive commandline.)

        I have a patch I'd suggest up for this anyway to make life a little easier for those who read this section of code in the future.

        Could you please verify?

        Show
        Sushanth Sowmyan added a comment - Per my testing, this is actually working currently. From code-reading, I was initially convinced there was a bug, but it does seem to properly initialize hbase delegation tokens for read and write jobs, from hive and hcatalog, but it currently works for me. I ran the following: To create the table: CREATE TABLE sushhbdata(rowkey STRING, pageviews STRING, bytes STRING); LOAD DATA LOCAL INPATH 'sushhbdata' INTO TABLE sushhbdata; CREATE TABLE IF NOT EXISTS sushpagecounts_hbase (rowkey STRING, pageviews STRING, bytes STRING) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ('hbase.columns.mapping' = ':key,f:c1,f:c2') TBLPROPERTIES ('hbase.table.name' = 'sushpagecounts'); To insert data into the table: set hive.execution.engine=mr; FROM sushhbdata INSERT INTO TABLE sushpagecounts_hbase SELECT * where rowkey like 'EN%'; For the actual data, I just randomly generated strings of varying length in all caps, there were quite a few that began with "EN". (Note that I set the execution engine to mr because there is another unrelated tez issue that blocks it currently if you don't set it, and I think Gunther or Sergei are trying to fix that. And I don't need aux jars either for our current hive commandline.) I have a patch I'd suggest up for this anyway to make life a little easier for those who read this section of code in the future. Could you please verify?
        Sushanth Sowmyan made changes -
        Attachment HIVE-5523.patch [ 12636442 ]
        Hide
        Nick Dimiduk added a comment -

        I'm not sure I understand the enhancement of flipping the input/output logic, but the comment helps clarify things a bit. Will take this for a spin in a few.

        Show
        Nick Dimiduk added a comment - I'm not sure I understand the enhancement of flipping the input/output logic, but the comment helps clarify things a bit. Will take this for a spin in a few.
        Nick Dimiduk made changes -
        Assignee Sushanth Sowmyan [ sushanth ]
        Hide
        Nick Dimiduk added a comment -

        This works, +1. Thanks for digging in.

        Show
        Nick Dimiduk added a comment - This works, +1. Thanks for digging in.
        Nick Dimiduk made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Sushanth Sowmyan added a comment -

        Thanks!

        The reason I flipped the input/output logic was because hive executes the input part "always". HCat will execute input if input and output if output. For HCat, thus, it makes no difference whether we check for input or output. For hive, it has no concept of input and output, and thus executes a "default", which is the same as input, so it felt like it was a bit cleaner to treat output as a special case which defaulted to false, rather than to treat input as a selector that defaulted to true. I'm still not completely happy about it, to be honest.

        Show
        Sushanth Sowmyan added a comment - Thanks! The reason I flipped the input/output logic was because hive executes the input part "always". HCat will execute input if input and output if output. For HCat, thus, it makes no difference whether we check for input or output. For hive, it has no concept of input and output, and thus executes a "default", which is the same as input, so it felt like it was a bit cleaner to treat output as a special case which defaulted to false, rather than to treat input as a selector that defaulted to true. I'm still not completely happy about it, to be honest.
        Hide
        Ashutosh Chauhan added a comment -

        +1

        Show
        Ashutosh Chauhan added a comment - +1
        Sushanth Sowmyan made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Hide
        Sushanth Sowmyan added a comment -

        I'm canceling this patch because I feel there is still more code cleanup requried here to make things more obvious, and code cleanup is now the point of this patch, since the original reported issue is working without this patch. HIVE-6915 fixes adding delegation token for Tez, and in doing so, opens up questions on whether how it's done currently is working because it happens-to-work, or because that's the way it's supposed to, and I'm leaning towards the former.

        Show
        Sushanth Sowmyan added a comment - I'm canceling this patch because I feel there is still more code cleanup requried here to make things more obvious, and code cleanup is now the point of this patch, since the original reported issue is working without this patch. HIVE-6915 fixes adding delegation token for Tez, and in doing so, opens up questions on whether how it's done currently is working because it happens-to-work, or because that's the way it's supposed to, and I'm leaning towards the former.
        Hide
        Sushanth Sowmyan added a comment -

        Closing as "Not a problem" without any commits, since the reworking is eventually unnecessary - and all this adds is really comments, and the tez-side issues are solved elsewhere.

        Show
        Sushanth Sowmyan added a comment - Closing as "Not a problem" without any commits, since the reworking is eventually unnecessary - and all this adds is really comments, and the tez-side issues are solved elsewhere.
        Sushanth Sowmyan made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Not a Problem [ 8 ]
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Patch Available Patch Available
        164d 18h 55m 1 Nick Dimiduk 25/Mar/14 16:52
        Patch Available Patch Available Open Open
        22d 3h 17m 1 Sushanth Sowmyan 16/Apr/14 20:09
        Open Open Resolved Resolved
        218d 5h 44m 1 Sushanth Sowmyan 21/Nov/14 01:54

          People

          • Assignee:
            Sushanth Sowmyan
            Reporter:
            Nick Dimiduk
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development