Description
HDFS-2538 disabled status reporting for the fsck command (it can optionally be enabled with the -showprogress option). We have observed that without status reporting the client will abort with read timeout:
[hdfs@lva1-hcl0030 ~]$ hdfs fsck / Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070 14/09/30 06:03:41 WARN security.UserGroupInformation: PriviledgedActionException as:hdfs@GRID.LINKEDIN.COM (auth:KERBEROS) cause:java.net.SocketTimeoutException: Read timed out Exception in thread "main" java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:152) at java.net.SocketInputStream.read(SocketInputStream.java:122) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323) at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312) at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72) at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149) at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346)
Since there's nothing for the client to read it will abort if the time required to complete the fsck operation is longer than the client's read timeout setting.
I can think of a couple ways to fix this:
- Set an infinite read timeout on the client side (not a good idea!).
- Have the server-side write (and flush) zeros to the wire and instruct the client to ignore these characters instead of echoing them.
- It's possible that flushing an empty buffer on the server-side will trigger an HTTP response with a zero length payload. This may be enough to keep the client from hanging up.
Attachments
Attachments
Issue Links
- is broken by
-
HDFS-2538 option to disable fsck dots
- Resolved
- relates to
-
HDFS-15216 Wrong Use Case of -showprogress in fsck
- Resolved