Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-7175

Client-side SocketTimeoutException during Fsck

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.3.0
    • 3.3.0, 3.1.4, 3.2.2
    • namenode
    • None

    Description

      HDFS-2538 disabled status reporting for the fsck command (it can optionally be enabled with the -showprogress option). We have observed that without status reporting the client will abort with read timeout:

      [hdfs@lva1-hcl0030 ~]$ hdfs fsck / 
      Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070
      14/09/30 06:03:41 WARN security.UserGroupInformation: PriviledgedActionException as:hdfs@GRID.LINKEDIN.COM (auth:KERBEROS) cause:java.net.SocketTimeoutException: Read timed out
      Exception in thread "main" java.net.SocketTimeoutException: Read timed out
      	at java.net.SocketInputStream.socketRead0(Native Method)
      	at java.net.SocketInputStream.read(SocketInputStream.java:152)
      	at java.net.SocketInputStream.read(SocketInputStream.java:122)
      	at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
      	at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
      	at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
      	at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
      	at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
      	at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
      	at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312)
      	at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72)
      	at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149)
      	at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:415)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
      	at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145)
      	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
      	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
      	at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346)
      

      Since there's nothing for the client to read it will abort if the time required to complete the fsck operation is longer than the client's read timeout setting.

      I can think of a couple ways to fix this:

      1. Set an infinite read timeout on the client side (not a good idea!).
      2. Have the server-side write (and flush) zeros to the wire and instruct the client to ignore these characters instead of echoing them.
      3. It's possible that flushing an empty buffer on the server-side will trigger an HTTP response with a zero length payload. This may be enough to keep the client from hanging up.

      Attachments

        1. HDFS-7175.patch
          0.9 kB
          Akira Ajisaka
        2. HDFS-7175.patch
          0.9 kB
          Akira Ajisaka
        3. HDFS-7175.2.patch
          1.0 kB
          Akira Ajisaka
        4. HDFS-7175.3.patch
          0.8 kB
          Akira Ajisaka
        5. HDFS-7157.004.patch
          3 kB
          Stephen O'Donnell

        Issue Links

          Activity

            People

              sodonnell Stephen O'Donnell
              cwsteinbach Carl Steinbach
              Votes:
              0 Vote for this issue
              Watchers:
              17 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: