Pig
  1. Pig
  2. PIG-3206

HBaseStorage does not work with Oozie pig action and secure HBase

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.10.1
    • Fix Version/s: 0.12.0, 0.11.1
    • Component/s: None
    • Labels:
      None

      Description

      HBaseStorage always tries to fetch delegation token for a secure hbase cluster. But when pig is launched through Oozie, it will fail as TGT is not available in the map job. In that case, it should try and reuse the hbase delegation token in JobConf passed to pig through mapreduce.job.credentials.binary property.

      1. PIG-3206-1.patch
        2 kB
        Rohini Palaniswamy

        Activity

        Rohini Palaniswamy created issue -
        Hide
        Rohini Palaniswamy added a comment -

        Resorted to fetching delegation token only if kerberos credentials are there. Other approach is to to check if the JobConf has delegation token using TokenSelector for that particular hbase cluster and if not fetch the token. But that requires getting the cluster ID of hbase to match the service name and there is no clean API available in hbase to fetch the cluster id.

        Show
        Rohini Palaniswamy added a comment - Resorted to fetching delegation token only if kerberos credentials are there. Other approach is to to check if the JobConf has delegation token using TokenSelector for that particular hbase cluster and if not fetch the token. But that requires getting the cluster ID of hbase to match the service name and there is no clean API available in hbase to fetch the cluster id.
        Rohini Palaniswamy made changes -
        Field Original Value New Value
        Attachment PIG-3206-1.patch [ 12570798 ]
        Rohini Palaniswamy made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Affects Version/s 0.10.1 [ 12320547 ]
        Fix Version/s 0.12 [ 12323380 ]
        Hide
        Dmitriy V. Ryaboy added a comment -

        Should we be catching the NoSuchMethodException for all those methods unavailable in 0.92 and 0.20.2? We'll definitely encounter is on m2.invoke in hadoop 0.20.2, according to the comment.

        Looks like now that the Class.forName line moved into an if block, the corresponding ClassNotFoundException should move in there as well.

        The description in this JIRA says " it should try and reuse the hbase delegation token in JobConf " if the kerberos token is not available. Where does this reuse happen? I am not sure how this patch solves the problem, but then I don't really know how the kerberos integration in HBaseStorage works in the first place .

        Maybe a test would be helpful?

        Show
        Dmitriy V. Ryaboy added a comment - Should we be catching the NoSuchMethodException for all those methods unavailable in 0.92 and 0.20.2? We'll definitely encounter is on m2.invoke in hadoop 0.20.2, according to the comment. Looks like now that the Class.forName line moved into an if block, the corresponding ClassNotFoundException should move in there as well. The description in this JIRA says " it should try and reuse the hbase delegation token in JobConf " if the kerberos token is not available. Where does this reuse happen? I am not sure how this patch solves the problem, but then I don't really know how the kerberos integration in HBaseStorage works in the first place . Maybe a test would be helpful?
        Hide
        Dmitriy V. Ryaboy added a comment -

        Cancelling patch to clear from review queue (feel free to set patch available if you have a new patch or if this one is sufficient and I am wrong ).

        Should we make the fix apply on 0.11.1 as well?

        Show
        Dmitriy V. Ryaboy added a comment - Cancelling patch to clear from review queue (feel free to set patch available if you have a new patch or if this one is sufficient and I am wrong ). Should we make the fix apply on 0.11.1 as well?
        Dmitriy V. Ryaboy made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Hide
        Rohini Palaniswamy added a comment -

        Dmitriy,
        I had added a comment to clarify in the beginning of the first if block. If there is no security defined in hbase-site.xml, the whole block will not get executed. So we don't expect the block to be executed with hadoop 0.20.2 or hbase 0.90.x which do not have security. We are using reflection so that when HBaseStorage class compiled against hadoop 1.x but run against 0.20.2 or 0.90.x does not throw NoSuchMethodError for UserGroupInformation methods when executing the addHBaseDelegationToken method.

        Looks like now that the Class.forName line moved into an if block, the corresponding ClassNotFoundException should move in there as well.

        ClassNotFoundException happens in the outer if block as well. So not using a separate try catch block for the inner if block.

        The description in this JIRA says " it should try and reuse the hbase delegation token in JobConf " if the kerberos token is not available. Where does this reuse happen?

        Pig in the front end (when run via command line) has access to the user TGT (either keytab or kinit from command line). It fetches the HBase delegation token by authenticating to HBase using KERBEROS and adds it to the JobConf. In the backend, HTable initiation picks up the delegation token from the JobConf and does further operations on HBase using DIGEST authentication. The getDelegationToken call is one call which cannot be done using DIGEST authentication(delegation token). The client always needs to authenticate with Kerberos to get the delegation token.

        The case with Oozie is that the pig frontend run happens on a mapper and there is no access to user TGT. The JobConf passed to pig (using mapreduce.job.credentials.binary) already has the required delegation tokens to talk to all services - HDFS, JT and HBASE. In HBaseStorage, we were always trying to get the hbase delegation token. Since there was no TGT, it was failing in the pig launcher mapper. Now just added an extra check, so that we don't try to fetch the token if we don't have TGT.

        Yes. It would be good to apply this to 0.11.1 as well. I will do it.

        I will mark it back as patch available if it clarifies your questions. If you need me to add any comments to make it clear, I can do it.

        Show
        Rohini Palaniswamy added a comment - Dmitriy, I had added a comment to clarify in the beginning of the first if block. If there is no security defined in hbase-site.xml, the whole block will not get executed. So we don't expect the block to be executed with hadoop 0.20.2 or hbase 0.90.x which do not have security. We are using reflection so that when HBaseStorage class compiled against hadoop 1.x but run against 0.20.2 or 0.90.x does not throw NoSuchMethodError for UserGroupInformation methods when executing the addHBaseDelegationToken method. Looks like now that the Class.forName line moved into an if block, the corresponding ClassNotFoundException should move in there as well. ClassNotFoundException happens in the outer if block as well. So not using a separate try catch block for the inner if block. The description in this JIRA says " it should try and reuse the hbase delegation token in JobConf " if the kerberos token is not available. Where does this reuse happen? Pig in the front end (when run via command line) has access to the user TGT (either keytab or kinit from command line). It fetches the HBase delegation token by authenticating to HBase using KERBEROS and adds it to the JobConf. In the backend, HTable initiation picks up the delegation token from the JobConf and does further operations on HBase using DIGEST authentication. The getDelegationToken call is one call which cannot be done using DIGEST authentication(delegation token). The client always needs to authenticate with Kerberos to get the delegation token. The case with Oozie is that the pig frontend run happens on a mapper and there is no access to user TGT. The JobConf passed to pig (using mapreduce.job.credentials.binary) already has the required delegation tokens to talk to all services - HDFS, JT and HBASE. In HBaseStorage, we were always trying to get the hbase delegation token. Since there was no TGT, it was failing in the pig launcher mapper. Now just added an extra check, so that we don't try to fetch the token if we don't have TGT. Yes. It would be good to apply this to 0.11.1 as well. I will do it. I will mark it back as patch available if it clarifies your questions. If you need me to add any comments to make it clear, I can do it.
        Hide
        Rohini Palaniswamy added a comment -

        Maybe a test would be helpful?

        Manually tested with command line and Oozie against secure hbase for now. HADOOP-8078 which makes security unit tests possible is only available in Hadoop 3.0.

        Show
        Rohini Palaniswamy added a comment - Maybe a test would be helpful? Manually tested with command line and Oozie against secure hbase for now. HADOOP-8078 which makes security unit tests possible is only available in Hadoop 3.0.
        Hide
        Dmitriy V. Ryaboy added a comment -

        got it.
        +1, go ahead and commit.

        D

        Show
        Dmitriy V. Ryaboy added a comment - got it. +1, go ahead and commit. D
        Hide
        Rohini Palaniswamy added a comment -

        Thanks Dmitriy. Checked into 0.11.1 and trunk. Added a new section in CHANGES.txt for Release 0.11.1.

        Show
        Rohini Palaniswamy added a comment - Thanks Dmitriy. Checked into 0.11.1 and trunk. Added a new section in CHANGES.txt for Release 0.11.1.
        Rohini Palaniswamy made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Fix Version/s 0.11.1 [ 12324080 ]
        Resolution Fixed [ 1 ]
        Bill Graham made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Michalis Kongtongk made changes -
        Link This issue is a clone of PIG-4115 [ PIG-4115 ]
        Michalis Kongtongk made changes -
        Link This issue is a clone of PIG-4115 [ PIG-4115 ]
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Patch Available Patch Available
        3d 14h 41m 1 Rohini Palaniswamy 25/Feb/13 14:59
        Patch Available Patch Available Open Open
        3h 1 Dmitriy V. Ryaboy 25/Feb/13 17:59
        Open Open Resolved Resolved
        20h 32m 1 Rohini Palaniswamy 26/Feb/13 14:32
        Resolved Resolved Closed Closed
        35d 1h 22m 1 Bill Graham 02/Apr/13 16:54

          People

          • Assignee:
            Rohini Palaniswamy
            Reporter:
            Rohini Palaniswamy
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development