Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-7175

Client-side SocketTimeoutException during Fsck

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.3.0
    • Fix Version/s: 3.3.0, 3.1.4, 3.2.2
    • Component/s: namenode
    • Labels:
      None
    • Target Version/s:

      Description

      HDFS-2538 disabled status reporting for the fsck command (it can optionally be enabled with the -showprogress option). We have observed that without status reporting the client will abort with read timeout:

      [hdfs@lva1-hcl0030 ~]$ hdfs fsck / 
      Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070
      14/09/30 06:03:41 WARN security.UserGroupInformation: PriviledgedActionException as:hdfs@GRID.LINKEDIN.COM (auth:KERBEROS) cause:java.net.SocketTimeoutException: Read timed out
      Exception in thread "main" java.net.SocketTimeoutException: Read timed out
      	at java.net.SocketInputStream.socketRead0(Native Method)
      	at java.net.SocketInputStream.read(SocketInputStream.java:152)
      	at java.net.SocketInputStream.read(SocketInputStream.java:122)
      	at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
      	at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
      	at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
      	at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
      	at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
      	at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
      	at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312)
      	at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72)
      	at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149)
      	at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:415)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
      	at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145)
      	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
      	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
      	at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346)
      

      Since there's nothing for the client to read it will abort if the time required to complete the fsck operation is longer than the client's read timeout setting.

      I can think of a couple ways to fix this:

      1. Set an infinite read timeout on the client side (not a good idea!).
      2. Have the server-side write (and flush) zeros to the wire and instruct the client to ignore these characters instead of echoing them.
      3. It's possible that flushing an empty buffer on the server-side will trigger an HTTP response with a zero length payload. This may be enough to keep the client from hanging up.

        Attachments

        1. HDFS-7175.patch
          0.9 kB
          Akira Ajisaka
        2. HDFS-7175.patch
          0.9 kB
          Akira Ajisaka
        3. HDFS-7175.2.patch
          1.0 kB
          Akira Ajisaka
        4. HDFS-7175.3.patch
          0.8 kB
          Akira Ajisaka
        5. HDFS-7157.004.patch
          3 kB
          Stephen O'Donnell

          Issue Links

            Activity

              People

              • Assignee:
                sodonnell Stephen O'Donnell
                Reporter:
                cwsteinbach Carl Steinbach
              • Votes:
                0 Vote for this issue
                Watchers:
                19 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: