Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-6193

HftpFileSystem open should throw FileNotFoundException for non-existing paths

Details

    • Bug
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • 2.4.0
    • None
    • None

    Description

      WebHdfsFileSystem.open and HftpFileSystem.open incorrectly handles non-existing paths.

      • 'open', does not really open anything, i.e., it does not contact the server, and therefore cannot discover FileNotFound, it's deferred until next read. It's counterintuitive and not how local FS or HDFS work. In POSIX you get ENOENT on open. LzoInputFormat.getSplits is an example of the code that's broken because of this.
      • On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST instead of SC_NOT_FOUND for non-exitsing paths

      Attachments

        1. HDFS-6193-branch-2.4.v02.patch
          20 kB
          Gera Shegalov
        2. HDFS-6193-branch-2.4.0.v01.patch
          14 kB
          Gera Shegalov

        Issue Links

          Activity

            stevel@apache.org Steve Loughran added a comment -

            linking to HADOOP-9361 and FS semantics.

            Failing on the open if a file is not found is a core expectation of filesystems.

            We could optimise any of the web filesystems by not doing that open (e,g, S3, s3n, swift) and waiting for the first seek. But we don't because things expect missing files to not be there.

            Interesting that FileSystemContractBaseTest doesn't catch this

            stevel@apache.org Steve Loughran added a comment - linking to HADOOP-9361 and FS semantics. Failing on the open if a file is not found is a core expectation of filesystems. We could optimise any of the web filesystems by not doing that open (e,g, S3, s3n, swift) and waiting for the first seek. But we don't because things expect missing files to not be there. Interesting that FileSystemContractBaseTest doesn't catch this
            jira.shegalov Gera Shegalov added a comment -

            stevel@apache.org, thanks for following up.

            Interesting that FileSystemContractBaseTest doesn't catch this

            FileSystemContractBaseTest does not have a test for open on a non-exisisting path. Neither did TestHftpFileSystem. TestWebHdfsFileSystemContract.testOpenNonExistFile had incorrect implementation that relied on read to fail.

            We could optimise any of the web filesystems by not doing that open (e,g, S3, s3n, swift) and waiting for the first seek. But we don't because things expect missing files to not be there.

            Note that a seek for WebHdfs/Hftp is a client-only operation as well. Deferring real open to a stream operation is misleading because the application presumes an open stream when issuing a stream operation.

            jira.shegalov Gera Shegalov added a comment - stevel@apache.org , thanks for following up. Interesting that FileSystemContractBaseTest doesn't catch this FileSystemContractBaseTest does not have a test for open on a non-exisisting path. Neither did TestHftpFileSystem . TestWebHdfsFileSystemContract.testOpenNonExistFile had incorrect implementation that relied on read to fail. We could optimise any of the web filesystems by not doing that open (e,g, S3, s3n, swift) and waiting for the first seek. But we don't because things expect missing files to not be there. Note that a seek for WebHdfs/Hftp is a client-only operation as well. Deferring real open to a stream operation is misleading because the application presumes an open stream when issuing a stream operation.
            hadoopqa Hadoop QA added a comment -

            -1 overall. Here are the results of testing the latest attachment
            http://issues.apache.org/jira/secure/attachment/12638942/HDFS-6193-branch-2.4.0.v01.patch
            against trunk revision .

            -1 patch. The patch command could not apply the patch.

            Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6598//console

            This message is automatically generated.

            hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12638942/HDFS-6193-branch-2.4.0.v01.patch against trunk revision . -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6598//console This message is automatically generated.
            ozawa Tsuyoshi Ozawa added a comment -

            is HftpFileSystem missing from trunk now? Please correct me if I get wrong.

            ozawa Tsuyoshi Ozawa added a comment - is HftpFileSystem missing from trunk now? Please correct me if I get wrong.
            jira.shegalov Gera Shegalov added a comment -

            Hi ozawa, yeah Hftp was recently kicked out with HDFS-5570

            jira.shegalov Gera Shegalov added a comment - Hi ozawa , yeah Hftp was recently kicked out with HDFS-5570
            ozawa Tsuyoshi Ozawa added a comment -

            Thanks for the pointing, jira.shegalov! Now I could apply your patch against branch-2.4.0. However, some compilation error occurs with the patch.

            In HftpFileSystem, RangeHeaderInputStream cannot call the super constructor as follows:

            static class RangeHeaderInputStream extends ByteRangeInputStream {
               RangeHeaderInputStream(RangeHeaderUrlOpener o, RangeHeaderUrlOpener r)
                    throws IOException {
                  super(o, r, true);
                }
            

            FileDataServlet: the method ExceptionHandler.toHttpStatus is missing:

                  response.sendError(ExceptionHandler.toHttpStatus(e),
                      StringUtils.stringifyException(e));
            

            Can you check them? Thanks!

            ozawa Tsuyoshi Ozawa added a comment - Thanks for the pointing, jira.shegalov ! Now I could apply your patch against branch-2.4.0. However, some compilation error occurs with the patch. In HftpFileSystem, RangeHeaderInputStream cannot call the super constructor as follows: static class RangeHeaderInputStream extends ByteRangeInputStream { RangeHeaderInputStream(RangeHeaderUrlOpener o, RangeHeaderUrlOpener r) throws IOException { super (o, r, true ); } FileDataServlet: the method ExceptionHandler.toHttpStatus is missing: response.sendError(ExceptionHandler.toHttpStatus(e), StringUtils.stringifyException(e)); Can you check them? Thanks!
            jira.shegalov Gera Shegalov added a comment -

            Will upload a fixed version shortly.

            jira.shegalov Gera Shegalov added a comment - Will upload a fixed version shortly.
            hadoopqa Hadoop QA added a comment -

            -1 overall. Here are the results of testing the latest attachment
            http://issues.apache.org/jira/secure/attachment/12643130/HDFS-6193-branch-2.4.v02.patch
            against trunk revision .

            -1 patch. The patch command could not apply the patch.

            Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6799//console

            This message is automatically generated.

            hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12643130/HDFS-6193-branch-2.4.v02.patch against trunk revision . -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6799//console This message is automatically generated.
            ozawa Tsuyoshi Ozawa added a comment -

            Thank you for updating! +1 for the patch(non-binding).

            • Compilation works correctly.
            • Confirmed that WebHdfsFileSystem.open() and HftpFileSystem.open() throw FileNotFoundException when files are missing. Test cases covers it.
            ozawa Tsuyoshi Ozawa added a comment - Thank you for updating! +1 for the patch(non-binding). Compilation works correctly. Confirmed that WebHdfsFileSystem.open() and HftpFileSystem.open() throw FileNotFoundException when files are missing. Test cases covers it.
            ozawa Tsuyoshi Ozawa added a comment -

            Let's wait for review by HDFS experts.

            ozawa Tsuyoshi Ozawa added a comment - Let's wait for review by HDFS experts.
            wheat9 Haohui Mai added a comment -

            I don't think this is a blocker since hftp / hsftp have been deprecated and been superseded by webhdfs. It looks to me that the performance impact is still up to debate (the same fix has been applied to webhdfs in HDFS-6143, see the comments for the details).

            I'm moving it out to unblock 2.4.1. Feel free to move it back you think it is essential for the release.

            wheat9 Haohui Mai added a comment - I don't think this is a blocker since hftp / hsftp have been deprecated and been superseded by webhdfs. It looks to me that the performance impact is still up to debate (the same fix has been applied to webhdfs in HDFS-6143 , see the comments for the details). I'm moving it out to unblock 2.4.1. Feel free to move it back you think it is essential for the release.
            hadoopqa Hadoop QA added a comment -



            -1 overall



            Vote Subsystem Runtime Comment
            -1 patch 0m 0s The patch command could not apply the patch during dryrun.



            Subsystem Report/Notes
            Patch URL http://issues.apache.org/jira/secure/attachment/12643130/HDFS-6193-branch-2.4.v02.patch
            Optional Tests javadoc javac unit findbugs checkstyle
            git revision trunk / f1a152c
            Console output https://builds.apache.org/job/PreCommit-HDFS-Build/10603/console

            This message was automatically generated.

            hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment -1 patch 0m 0s The patch command could not apply the patch during dryrun. Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12643130/HDFS-6193-branch-2.4.v02.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / f1a152c Console output https://builds.apache.org/job/PreCommit-HDFS-Build/10603/console This message was automatically generated.
            hadoopqa Hadoop QA added a comment -



            -1 overall



            Vote Subsystem Runtime Comment
            -1 patch 0m 0s The patch command could not apply the patch during dryrun.



            Subsystem Report/Notes
            Patch URL http://issues.apache.org/jira/secure/attachment/12643130/HDFS-6193-branch-2.4.v02.patch
            Optional Tests javadoc javac unit findbugs checkstyle
            git revision trunk / f1a152c
            Console output https://builds.apache.org/job/PreCommit-HDFS-Build/10609/console

            This message was automatically generated.

            hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment -1 patch 0m 0s The patch command could not apply the patch during dryrun. Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12643130/HDFS-6193-branch-2.4.v02.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / f1a152c Console output https://builds.apache.org/job/PreCommit-HDFS-Build/10609/console This message was automatically generated.

            People

              jira.shegalov Gera Shegalov
              jira.shegalov Gera Shegalov
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: