Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-11280

Allow WebHDFS to reuse HTTP connections to NN

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.7.3, 2.6.5, 3.0.0-alpha1
    • 2.8.0, 2.7.4, 3.0.0-alpha2
    • hdfs
    • None
    • Reviewed

    Description

      WebHDFSClient calls "conn.disconnect()", which disconnects from the NameNode. When we use webhdfs as the source in distcp, this used up all ephemeral ports on the client side since all closed connections continue to occupy the port with TIME_WAIT status for some time.

      According to http://tinyurl.com/java7-http-keepalive, we should call conn.getInputStream().close() instead to make sure the connection is kept alive. This will get rid of the ephemeral port problem.

      Manual steps used to verify the bug fix:
      1. Build original hadoop jar.
      2. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | grep -c 50070" on the local machine shows a big number (100s).
      3. Build hadoop jar with this diff.
      4. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | grep -c 50070" on the local machine shows 0.
      5. The explanation: distcp's client side does a lot of directory scanning, which would create and close a lot of connections to the namenode HTTP port.

      Reference:
      2.7 and below: https://github.com/apache/hadoop/blob/branch-2.6/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L743

      2.8 and above: https://github.com/apache/hadoop/blob/branch-2.8/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L898

      Attachments

        1. HDFS-11280.for.2.7.and.below.patch
          0.9 kB
          Zheng Shao
        2. HDFS-11280.for.2.8.and.beyond.patch
          0.9 kB
          Zheng Shao
        3. HDFS-11280.for.2.8.and.beyond.2.patch
          0.8 kB
          Zheng Shao
        4. HDFS-11280.for.2.8.and.beyond.3.patch
          2 kB
          Zheng Shao
        5. HDFS-11280.for.2.8.and.beyond.4.patch
          2 kB
          Zheng Shao
        6. HDFS-11280.for.2.8.and.beyond.5.patch
          2 kB
          Zheng Shao

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            zshao Zheng Shao
            zshao Zheng Shao
            Votes:
            0 Vote for this issue
            Watchers:
            13 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment