Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 2.8.0
    • Fix Version/s: 2.8.0, 3.0.0-alpha2
    • Component/s: hdfs
    • Labels:
      None

      Description

      HDFS-9433 made an incompatible change to the semantics of getEZForPath. It used to return the EZ of the closest ancestor path. It never threw FNF. A common use of getEZForPath to determining if a file can be renamed, or must be copied due to mismatched EZs. Notably, this has broken hive.

        Issue Links

          Activity

          Hide
          rakeshr Rakesh R added a comment -

          Thank you Daryn Sharp for pointing out this. The javadoc in HdfsAdmin#getEncryptionZoneForPath says that if the path does not exist then throws FNF. IMHO, javadoc has to be modified to convey the message clearly so that we could avoid such situations in future. I'm interested to take the discussion ahead and work on this jira.

          Reference HdfsAdmin.java#L335

             * Get the path of the encryption zone for a given file or directory.
             *
             * @param path The path to get the ez for.
             *
             * @return The EncryptionZone of the ez, or null if path is not in an ez.
             * @throws IOException            if there was a general IO exception
             * @throws AccessControlException if the caller does not have access to path
             * @throws FileNotFoundException  if the path does not exist
          

          discussion thread

          Show
          rakeshr Rakesh R added a comment - Thank you Daryn Sharp for pointing out this. The javadoc in HdfsAdmin#getEncryptionZoneForPath says that if the path does not exist then throws FNF. IMHO, javadoc has to be modified to convey the message clearly so that we could avoid such situations in future. I'm interested to take the discussion ahead and work on this jira. Reference HdfsAdmin.java#L335 * Get the path of the encryption zone for a given file or directory. * * @param path The path to get the ez for . * * @ return The EncryptionZone of the ez, or null if path is not in an ez. * @ throws IOException if there was a general IO exception * @ throws AccessControlException if the caller does not have access to path * @ throws FileNotFoundException if the path does not exist discussion thread
          Hide
          andrew.wang Andrew Wang added a comment -

          Sergio Peña could you help shed some light on how Hive uses this API? Wondering if this is intentional, or if it could be worked around somehow.

          Show
          andrew.wang Andrew Wang added a comment - Sergio Peña could you help shed some light on how Hive uses this API? Wondering if this is intentional, or if it could be worked around somehow.
          Hide
          spena Sergio Peña added a comment -

          Andrew Wang The following class is the one that contains the calls to getEncryptionZoneForPath (search for all method calls there):
          https://github.com/apache/hive/blob/master/shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java#L1118

          The methods that call such function are:

          public boolean isPathEncrypted(Path path) throws IOException;
          public boolean arePathsOnSameEncryptionZone(Path path1, Path path2) throws IOException;
          public boolean arePathsOnSameEncryptionZone(Path path1, Path path2, HadoopShims.HdfsEncryptionShim encryptionShim2) throws IOException;
          public int comparePathKeyStrength(Path path1, Path path2) throws IOException;
          
          Show
          spena Sergio Peña added a comment - Andrew Wang The following class is the one that contains the calls to getEncryptionZoneForPath (search for all method calls there): https://github.com/apache/hive/blob/master/shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java#L1118 The methods that call such function are: public boolean isPathEncrypted(Path path) throws IOException; public boolean arePathsOnSameEncryptionZone(Path path1, Path path2) throws IOException; public boolean arePathsOnSameEncryptionZone(Path path1, Path path2, HadoopShims.HdfsEncryptionShim encryptionShim2) throws IOException; public int comparePathKeyStrength(Path path1, Path path2) throws IOException;
          Hide
          andrew.wang Andrew Wang added a comment -

          Okay, from a quick look at the Hive code, one possible issue is with SemanticAnalyzer#getStagingDirectoryPathname, since it passes in a temp path that might not exist yet. Correct comparison would be with the parent dir instead.

          Daryn Sharp do you have a stack trace we can look at?

          Also, do we have any other "query" APIs that take a path, but do not throw FNF? This seems semantically weird to me, which is why I'm wondering if we can fix this in Hive instead.

          Show
          andrew.wang Andrew Wang added a comment - Okay, from a quick look at the Hive code, one possible issue is with SemanticAnalyzer#getStagingDirectoryPathname , since it passes in a temp path that might not exist yet. Correct comparison would be with the parent dir instead. Daryn Sharp do you have a stack trace we can look at? Also, do we have any other "query" APIs that take a path, but do not throw FNF? This seems semantically weird to me, which is why I'm wondering if we can fix this in Hive instead.
          Hide
          daryn Daryn Sharp added a comment -

          I'll try to dig out the stacktrace, but the code in question was checking the encryption zone of both source and dest to determine if a rename is possible or must fallback to copy. This explained why I started seeing spikes of getEZForPath calls when we don't even use encryption zones!

          IMHO, rename should throw an IOE-derived IncompatibleEncryptionZonesException instead a bland IOE. This would allow a client to blindly attempt the rename, catch the specific exception, fallback to copy if required. In the vast majority of cases that removes 2 junk calls for non-existence EZs.

          In the name of compatibility, I lean towards reverting the incompatible change. This is one of the reasons 2.8 certification has ground to a halt. One could argue that most calls throw FNF because they query or manipulate a specific path. If it's not there, game over. But in the case of encryption zones and erasure coding, these features are tree-based. Properties are inherited by searching up the ancestor paths so it's more about asking "what is or would the EC or EZ be for this path?".

          In any case, it now requires 3X rpcs to do a simple rename. I want specific exceptions to eliminate the 2 junk calls. The trend of new "always on" features... with direct or indirect non-trivial performance costs... for common operations... when not used... is becoming rather irritating. I'm sisyphus@hdfs.hadoop.org and I approve this rant message.

          Show
          daryn Daryn Sharp added a comment - I'll try to dig out the stacktrace, but the code in question was checking the encryption zone of both source and dest to determine if a rename is possible or must fallback to copy. This explained why I started seeing spikes of getEZForPath calls when we don't even use encryption zones! IMHO, rename should throw an IOE-derived IncompatibleEncryptionZonesException instead a bland IOE. This would allow a client to blindly attempt the rename, catch the specific exception, fallback to copy if required. In the vast majority of cases that removes 2 junk calls for non-existence EZs. In the name of compatibility, I lean towards reverting the incompatible change. This is one of the reasons 2.8 certification has ground to a halt. One could argue that most calls throw FNF because they query or manipulate a specific path. If it's not there, game over. But in the case of encryption zones and erasure coding, these features are tree-based. Properties are inherited by searching up the ancestor paths so it's more about asking "what is or would the EC or EZ be for this path?". In any case, it now requires 3X rpcs to do a simple rename. I want specific exceptions to eliminate the 2 junk calls. The trend of new "always on" features... with direct or indirect non-trivial performance costs... for common operations... when not used... is becoming rather irritating. I'm sisyphus@hdfs.hadoop.org and I approve this rant message.
          Hide
          andrew.wang Andrew Wang added a comment -

          Agree about throwing a specific rename exception.

          I poked around our API to try and find precedent for this old behavior of returning null rather than throwing FNF. checkAccess was a possible candidate since we don't require existence when doing write operations (e.g. mkdirs), but it also throws FNF. Cache directives do not throw FNF, but that's not an API example I'd like to repeat. We should have attached them to the inode.

          In the name of compatibility, we should revert this from the branch-2s. I'd hope by the time 3.0 rolls around, we've either fixed Hive to call this on parent dirs instead, or better, moved over to "rename falling back to copy on special IOException" as Daryn proposed.

          Sergio Peña does this sound reasonable from the Hive side?

          Show
          andrew.wang Andrew Wang added a comment - Agree about throwing a specific rename exception. I poked around our API to try and find precedent for this old behavior of returning null rather than throwing FNF. checkAccess was a possible candidate since we don't require existence when doing write operations (e.g. mkdirs ), but it also throws FNF. Cache directives do not throw FNF, but that's not an API example I'd like to repeat. We should have attached them to the inode. In the name of compatibility, we should revert this from the branch-2s. I'd hope by the time 3.0 rolls around, we've either fixed Hive to call this on parent dirs instead, or better, moved over to "rename falling back to copy on special IOException" as Daryn proposed. Sergio Peña does this sound reasonable from the Hive side?
          Hide
          daryn Daryn Sharp added a comment -

          I'm not sure querying the EZ for the parent directory is the right fix, and we should consider that code other than hive depends on the old semantics.

          Depending on which rename is called, and if the dest is a dir, it will move the src into the dest dir. So let's say I want to rename from /dir1/file to /dir2/dir-with-EZ. The result may be /dir2/dir-with-EZ/file. If the parent is queried, ie. /dir2, the rename will look like it should succeed (src and dest have no EZ) but will fail. The old debatably broken semantics cover this case.

          I realized another similar use case to consider. Let's I want to create a file only if the path is under an EZ. Should I have to query the parent's EZ, catch FNF, repeat until I find an EZ or hit the root dir? The no-FNF semantics require 1 RPC.

          In the end we may need to consider preserving the no-FNF semantics and add specific exceptions.

          Show
          daryn Daryn Sharp added a comment - I'm not sure querying the EZ for the parent directory is the right fix, and we should consider that code other than hive depends on the old semantics. Depending on which rename is called, and if the dest is a dir, it will move the src into the dest dir. So let's say I want to rename from /dir1/file to /dir2/dir-with-EZ. The result may be /dir2/dir-with-EZ/file. If the parent is queried, ie. /dir2, the rename will look like it should succeed (src and dest have no EZ) but will fail. The old debatably broken semantics cover this case. I realized another similar use case to consider. Let's I want to create a file only if the path is under an EZ. Should I have to query the parent's EZ, catch FNF, repeat until I find an EZ or hit the root dir? The no-FNF semantics require 1 RPC. In the end we may need to consider preserving the no-FNF semantics and add specific exceptions.
          Hide
          kihwal Kihwal Lee added a comment -

          +1 on revert. We can then reopen HDFS-9433 and decide what to do.

          Show
          kihwal Kihwal Lee added a comment - +1 on revert. We can then reopen HDFS-9433 and decide what to do.
          Hide
          andrew.wang Andrew Wang added a comment -

          Okay, sounds good. Since this was released in 3.0.0-alpha1, let's use this JIRA to track the revert for changelog purposes. We can open a new JIRA for the special rename exception.

          Show
          andrew.wang Andrew Wang added a comment - Okay, sounds good. Since this was released in 3.0.0-alpha1, let's use this JIRA to track the revert for changelog purposes. We can open a new JIRA for the special rename exception.
          Hide
          andrew.wang Andrew Wang added a comment -

          Rakesh, hope you don't mind, but I spent a little time making a new patch based on the revert. Besides reverting, I also updated the javadoc for HdfsAdmin and modified the test to assert the "returns null for non-existent path" behavior.

          Show
          andrew.wang Andrew Wang added a comment - Rakesh, hope you don't mind, but I spent a little time making a new patch based on the revert. Besides reverting, I also updated the javadoc for HdfsAdmin and modified the test to assert the "returns null for non-existent path" behavior.
          Hide
          kihwal Kihwal Lee added a comment -

          I tried reverting and compared with the patch. It looks like the FNFE at HdfsAdmin#getEncryptionZoneForPath() came from HDFS-6546. The changes look good to me.

          Show
          kihwal Kihwal Lee added a comment - I tried reverting and compared with the patch. It looks like the FNFE at HdfsAdmin#getEncryptionZoneForPath() came from HDFS-6546 . The changes look good to me.
          Hide
          hadoopqa Hadoop QA added a comment -
          +1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 17s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          0 mvndep 0m 10s Maven dependency ordering for branch
          +1 mvninstall 6m 55s trunk passed
          +1 compile 1m 28s trunk passed
          +1 checkstyle 0m 32s trunk passed
          +1 mvnsite 1m 27s trunk passed
          +1 mvneclipse 0m 25s trunk passed
          +1 findbugs 3m 9s trunk passed
          +1 javadoc 1m 17s trunk passed
          0 mvndep 0m 7s Maven dependency ordering for patch
          +1 mvninstall 1m 18s the patch passed
          +1 compile 1m 23s the patch passed
          +1 javac 1m 23s the patch passed
          -0 checkstyle 0m 29s hadoop-hdfs-project: The patch generated 1 new + 98 unchanged - 0 fixed = 99 total (was 98)
          +1 mvnsite 1m 21s the patch passed
          +1 mvneclipse 0m 20s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 3m 22s the patch passed
          +1 javadoc 1m 10s the patch passed
          +1 unit 0m 55s hadoop-hdfs-client in the patch passed.
          +1 unit 59m 5s hadoop-hdfs in the patch passed.
          +1 asflicense 0m 20s The patch does not generate ASF License warnings.
          86m 52s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Issue HDFS-10850
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12830949/HDSF-10850.001.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux e188ccd10673 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 236ac77
          Default Java 1.8.0_101
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/16931/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16931/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16931/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 17s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. 0 mvndep 0m 10s Maven dependency ordering for branch +1 mvninstall 6m 55s trunk passed +1 compile 1m 28s trunk passed +1 checkstyle 0m 32s trunk passed +1 mvnsite 1m 27s trunk passed +1 mvneclipse 0m 25s trunk passed +1 findbugs 3m 9s trunk passed +1 javadoc 1m 17s trunk passed 0 mvndep 0m 7s Maven dependency ordering for patch +1 mvninstall 1m 18s the patch passed +1 compile 1m 23s the patch passed +1 javac 1m 23s the patch passed -0 checkstyle 0m 29s hadoop-hdfs-project: The patch generated 1 new + 98 unchanged - 0 fixed = 99 total (was 98) +1 mvnsite 1m 21s the patch passed +1 mvneclipse 0m 20s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 3m 22s the patch passed +1 javadoc 1m 10s the patch passed +1 unit 0m 55s hadoop-hdfs-client in the patch passed. +1 unit 59m 5s hadoop-hdfs in the patch passed. +1 asflicense 0m 20s The patch does not generate ASF License warnings. 86m 52s Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Issue HDFS-10850 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12830949/HDSF-10850.001.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux e188ccd10673 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 236ac77 Default Java 1.8.0_101 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/16931/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16931/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16931/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          rakeshr Rakesh R added a comment -

          Thank you Andrew Wang for your time and discussions to get the consensus. The patch looks good to me, apart from one minor checkstyle warning.

          Unused import - java.io.FileNotFoundException.

          +1 (non-binding)

          Show
          rakeshr Rakesh R added a comment - Thank you Andrew Wang for your time and discussions to get the consensus. The patch looks good to me, apart from one minor checkstyle warning. Unused import - java.io.FileNotFoundException. +1 (non-binding)
          Hide
          kihwal Kihwal Lee added a comment -

          +1 for the patch.

          Show
          kihwal Kihwal Lee added a comment - +1 for the patch.
          Hide
          kihwal Kihwal Lee added a comment -

          Thanks for investigating and fixing this, Andrew and thanks for reviewing the fix, Rakesh. I've verified TestEncryptionZones passes on branch-2 and branch-2.8 after the patch.

          Show
          kihwal Kihwal Lee added a comment - Thanks for investigating and fixing this, Andrew and thanks for reviewing the fix, Rakesh. I've verified TestEncryptionZones passes on branch-2 and branch-2.8 after the patch.
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10518 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10518/)
          HDFS-10850. getEZForPath should NOT throw FNF. Contributed by Andrew (kihwal: rev 0670149c88852cd7c4d6774bff06c7c588558739)

          • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirEncryptionZoneOp.java
          • (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
          • (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestEncryptionZones.java
          • (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/client/HdfsAdmin.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10518 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10518/ ) HDFS-10850 . getEZForPath should NOT throw FNF. Contributed by Andrew (kihwal: rev 0670149c88852cd7c4d6774bff06c7c588558739) (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirEncryptionZoneOp.java (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestEncryptionZones.java (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/client/HdfsAdmin.java

            People

            • Assignee:
              andrew.wang Andrew Wang
              Reporter:
              daryn Daryn Sharp
            • Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development