Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-8845

When looking for parent paths info, globStatus must filter out non-directory elements to avoid an AccessControlException

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 2.0.0-alpha
    • None
    • fs

    Description

      A brief description from my colleague Stephen Fritz who helped discover it:

      [root@node1 ~]# su - hdfs
      -bash-4.1$ echo "My Test String">testfile <-- just a text file, for testing below
      -bash-4.1$ hadoop dfs -mkdir /tmp/testdir <-- create a directory
      -bash-4.1$ hadoop dfs -mkdir /tmp/testdir/1 <-- create a subdirectory
      -bash-4.1$ hadoop dfs -put testfile /tmp/testdir/1/testfile <-- put the test file in the subdirectory
      -bash-4.1$ hadoop dfs -put testfile /tmp/testdir/testfile <-- put the test file in the directory
      -bash-4.1$ hadoop dfs -lsr /tmp/testdir
      drwxr-xr-x   - hdfs hadoop          0 2012-09-25 06:52 /tmp/testdir/1
      -rw-r--r--   3 hdfs hadoop         15 2012-09-25 06:52 /tmp/testdir/1/testfile
      -rw-r--r--   3 hdfs hadoop         15 2012-09-25 06:52 /tmp/testdir/testfile
      All files are where we expect them...OK, let's try reading
      
      -bash-4.1$ hadoop dfs -cat /tmp/testdir/testfile
      My Test String <-- success!
      
      -bash-4.1$ hadoop dfs -cat /tmp/testdir/1/testfile
      My Test String <-- success!
      
      -bash-4.1$ hadoop dfs -cat /tmp/testdir/*/testfile
      My Test String <-- success!  
      Note that we used an '*' in the cat command, and it correctly found the subdirectory '/tmp/testdir/1', and ignore the regular file '/tmp/testdir/testfile'
      
      -bash-4.1$ exit
      logout
      [root@node1 ~]# su - testuser <-- lets try it as a different user:
      [testuser@node1 ~]$ hadoop dfs -lsr /tmp/testdir
      drwxr-xr-x   - hdfs hadoop          0 2012-09-25 06:52 /tmp/testdir/1
      -rw-r--r--   3 hdfs hadoop         15 2012-09-25 06:52 /tmp/testdir/1/testfile
      -rw-r--r--   3 hdfs hadoop         15 2012-09-25 06:52 /tmp/testdir/testfile
      [testuser@node1 ~]$ hadoop dfs -cat /tmp/testdir/testfile
      My Test String <-- good
      
      [testuser@node1 ~]$ hadoop dfs -cat /tmp/testdir/1/testfile
      My Test String <-- so far so good
      
      [testuser@node1 ~]$ hadoop dfs -cat /tmp/testdir/*/testfile
      cat: org.apache.hadoop.security.AccessControlException: Permission denied: user=testuser, access=EXECUTE, inode="/tmp/testdir/testfile":hdfs:hadoop:-rw-r--r--
      

      Essentially, we hit a ACE with access=EXECUTE on file /tmp/testdir/testfile cause we tried to access the /tmp/testdir/testfile/testfile as a path. This shouldn't happen, as the testfile is a file and not a path parent to be looked up upon.

      2012-09-25 07:24:27,406 INFO org.apache.hadoop.ipc.Server: IPC Server
      handler 2 on 8020, call getFileInfo(/tmp/testdir/testfile/testfile)
      

      Surprisingly the superuser avoids hitting into the error, as a result of bypassing permissions, but that can be looked up on another JIRA - if it is fine to let it be like that or not.

      This JIRA targets a client-sided fix to not cause such /path/file/dir or /path/file/file kinda lookups.

      Attachments

        1. HADOOP-8845.patch
          3 kB
          Harsh J
        2. HADOOP-8845.patch
          5 kB
          Harsh J
        3. HADOOP-8845.patch
          4 kB
          Harsh J

        Issue Links

          Activity

            People

              qwertymaniac Harsh J
              qwertymaniac Harsh J
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: