Uploaded image for project: 'Ambari'
  1. Ambari
  2. AMBARI-21070

Race condition: webhdfs call mkdir /tmp/druid-indexing before /tmp making tmp not writable.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.5.2
    • None
    • None

    Description

      Race condition: webhdfs call mkdir /tmp/druid-indexing before /tmp making tmp
      not writable.

      @HDP install through ambari , just at the step start components on host< > we
      have some webhdfs operations in background which is creating HDFS directory
      structures required for specific components like (/tmp, /tmp/hive /user/druid
      /tmp/druid-indexing ...)

      generally the expected order is getfileInfo : /tmp --> mkdir: /tmp
      changePermission: /tmp to 777 (hdfs:hdfs) so that /tmp is accessible to all ,
      hence hivemetastore able to create /tmp/hive(hive scratch directory)

      But here in this case specific to druid install , most of the times mkdir of
      /tmp/druid-indexing called before(actual /tmp creation) and thus /tmp is
      having just default directory permission(755).

      ->So next call of getfileInfo : /tmp says already exist it will not further create and change permission

      This made /tmp not accessible to write, So HiveServer process gets shutdown as
      it unable to create/access /tmp/hive.

      hdfs-audit log:

      2017-05-12 06:39:51,067 INFO FSNamesystem.audit: allowed=true ugi=hdfs (auth:SIMPLE) ip=/172.27.26.3 cmd=getfileinfo src=/tmp/druid-indexing dst=null perm=null proto=webhdfs
      2017-05-12 06:39:51,120 INFO FSNamesystem.audit: allowed=true ugi=hdfs (auth:SIMPLE) ip=/172.27.22.81 cmd=contentSummary src=/user/druid dst=null perm=null proto=webhdfs
      2017-05-12 06:39:51,133 INFO FSNamesystem.audit: allowed=true ugi=hdfs (auth:SIMPLE) ip=/172.27.37.200 cmd=setPermission src=/ats/active dst=null perm=hdfs:hadoop:rwxr-xr-x proto=webhdfs
      2017-05-12 06:39:51,155 INFO FSNamesystem.audit: allowed=true ugi=hdfs (auth:SIMPLE) ip=/172.27.26.3 cmd=mkdirs src=/tmp/druid-indexing dst=null perm=hdfs:hdfs:rwxr-xr-x proto=webhdfs
      2017-05-12 06:39:51,206 INFO FSNamesystem.audit: allowed=true ugi=hdfs (auth:SIMPLE) ip=/172.27.22.81 cmd=listStatus src=/user/druid dst=null perm=null proto=webhdfs
      2017-05-12 06:39:51,235 INFO FSNamesystem.audit: allowed=true ugi=hdfs (auth:SIMPLE) ip=/172.27.37.200 cmd=setPermission src=/ats/ dst=null perm=yarn:hadoop:rwxr-xr-x proto=webhdfs
      2017-05-12 06:39:51,249 INFO FSNamesystem.audit: allowed=true ugi=hdfs (auth:SIMPLE) ip=/172.27.26.3 cmd=setPermission src=/tmp/druid-indexing dst=null perm=hdfs:hdfs:rwxr-xr-x proto=webhdfs
      2017-05-12 06:39:51,290 INFO FSNamesystem.audit: allowed=true ugi=hdfs (auth:SIMPLE) ip=/172.27.22.81 cmd=listStatus src=/user/druid/data dst=null perm=null proto=webhdfs
      2017-05-12 06:39:51,339 INFO FSNamesystem.audit: allowed=true ugi=hdfs (auth:SIMPLE) ip=/172.27.37.200 cmd=setPermission src=/ats/active/ dst=null perm=hdfs:hadoop:rwxr-xr-x proto=webhdfs
      2017-05-12 06:39:51,341 INFO FSNamesystem.audit: allowed=true ugi=hdfs (auth:SIMPLE) ip=/172.27.26.3 cmd=setOwner src=/tmp/druid-indexing dst=null perm=druid:hdfs:rwxr-xr-x proto=webhdfs
      2017-05-12 06:39:51,380 INFO FSNamesystem.audit: allowed=true ugi=hdfs (auth:SIMPLE) ip=/172.27.22.81 cmd=setOwner src=/user/druid/data dst=null perm=druid:hdfs:rwxr-xr-x proto=webhdfs
      2017-05-12 06:39:51,431 INFO FSNamesystem.audit: allowed=true ugi=hdfs (auth:SIMPLE) ip=/172.27.37.200 cmd=setOwner src=/ats/active dst=null perm=yarn:hadoop:rwxr-xr-x proto=webhdfs
      2017-05-12 06:39:51,526 INFO FSNamesystem.audit: allowed=true ugi=hdfs (auth:SIMPLE) ip=/172.27.37.200 cmd=setOwner src=/ats/ dst=null perm=yarn:hadoop:rwxr-xr-x proto=webhdfs
      2017-05-12 06:39:51,580 INFO FSNamesystem.audit: allowed=true ugi=hdfs (auth:SIMPLE) ip=/172.27.32.12 cmd=getfileinfo src=/apps/hbase/staging dst=null perm=null proto=webhdfs
      2017-05-12 06:39:51,620 INFO FSNamesystem.audit: allowed=true ugi=hdfs (auth:SIMPLE) ip=/172.27.37.200 cmd=setOwner src=/ats/active/ dst=null perm=yarn:hadoop:rwxr-xr-x proto=webhdfs
      ....

      2017-05-12 06:39:53,289 INFO FSNamesystem.audit: allowed=true ugi=hdfs (auth:SIMPLE) ip=/172.27.26.202 cmd=getfileinfo src=/tmp dst=null perm=null proto=webhdfs

      We can see in the log accessing /tmp/druid-indexing at 06:39:51(hence
      /tmp/have just 755 permission as per call), and accessing /tmp(getfileinfo) at
      06:39:53, which returns /tmp already existing.

      HiveServer2 goes down with AccessControlException: Permission denied:
      user=hive, access=WRITE, inode="/tmp/hive":hdfs:hdfs:drwxr-xr-x (as /tmp/hive
      itself could not be created)

      java.lang.RuntimeException: Error applying authorization policy on hive configuration: org.apache.hadoop.security.AccessControlException: Permission denied: user=hive, access=WRITE, inode="/tmp/hive":hdfs:hdfs:drwxr-xr-x
      at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319)
      at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:292)
      at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:213)
      at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
      at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1955)
      at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1939)
      at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:1922)
      at org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(FSDirMkdirOp.java:71)
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4150)
      at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1109)
      at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:633)
      at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
      at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
      at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
      at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
      at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:422)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
      at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2345)

      [root@ctr-e133-1493418528701-32069-01-000002 hive]# hadoop fs -ls /
      Found 8 items
      drwxrwxrwx - yarn hadoop 0 2017-05-12 06:42 /app-logs
      drwxr-xr-x - hdfs hdfs 0 2017-05-12 06:41 /apps
      drwxr-xr-x - yarn hadoop 0 2017-05-12 06:39 /ats
      drwxr-xr-x - hdfs hdfs 0 2017-05-12 06:40 /hdp
      drwxr-xr-x - mapred hdfs 0 2017-05-12 06:40 /mapred
      drwxrwxrwx - mapred hadoop 0 2017-05-12 06:40 /mr-history
      drwxr-xr-x - hdfs hdfs 0 2017-05-12 06:40 /tmp
      drwxr-xr-x - hdfs hdfs 0 2017-05-12 07:17 /user

      [root@ctr-e133-1493418528701-32069-01-000002 hive]# hadoop fs -ls /tmp
      Found 2 items
      drwxr-xr-x - druid hdfs 0 2017-05-12 07:50 /tmp/druid-indexing
      drwxr-xr-x - hdfs hdfs 0 2017-05-12 06:40 /tmp/entity-file-history

      On a Cluster which passed on creating /tmp and giving proper permissions looks
      like below(These have no druid installed)

      2017-05-18 00:25:18,092 INFO FSNamesystem.audit: allowed=true ugi=hdfs (auth:KERBEROS) ip=/172.27.25.65 cmd=getfileinfo src=/tmp dst=null perm=null proto=webhdfs
      2017-05-18 00:25:18,195 INFO FSNamesystem.audit: allowed=true ugi=hdfs (auth:KERBEROS) ip=/172.27.25.65 cmd=mkdirs src=/tmp dst=null perm=hdfs:hdfs:rwxr-xr-x proto=webhdfs
      2017-05-18 00:25:18,290 INFO FSNamesystem.audit: allowed=true ugi=hdfs (auth:KERBEROS) ip=/172.27.25.65 cmd=setPermission src=/tmp dst=null perm=hdfs:hdfs:rwxrwxrwx proto=webhdfs
      2017-05-18 00:25:18,385 INFO FSNamesystem.audit: allowed=true ugi=hdfs (auth:KERBEROS) ip=/172.27.25.65 cmd=setOwner src=/tmp dst=null perm=hdfs:hdfs:rwxrwxrwx proto=webhdfs

      Attachments

        1. AMBARI-21070.patch
          6 kB
          Andrew Onischuk

        Issue Links

          Activity

            People

              aonishuk Andrew Onischuk
              aonishuk Andrew Onischuk
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: