[HDFS-7175] Client-side SocketTimeoutException during Fsck - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.3.0
Fix Version/s: 3.3.0, 3.1.4, 3.2.2
Component/s: namenode
Labels:
None

Target Version/s:

3.3.0

Description

~~HDFS-2538~~ disabled status reporting for the fsck command (it can optionally be enabled with the -showprogress option). We have observed that without status reporting the client will abort with read timeout:

[hdfs@lva1-hcl0030 ~]$ hdfs fsck / 
Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070
14/09/30 06:03:41 WARN security.UserGroupInformation: PriviledgedActionException as:hdfs@GRID.LINKEDIN.COM (auth:KERBEROS) cause:java.net.SocketTimeoutException: Read timed out
Exception in thread "main" java.net.SocketTimeoutException: Read timed out
	at java.net.SocketInputStream.socketRead0(Native Method)
	at java.net.SocketInputStream.read(SocketInputStream.java:152)
	at java.net.SocketInputStream.read(SocketInputStream.java:122)
	at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
	at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
	at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
	at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
	at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
	at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
	at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312)
	at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72)
	at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149)
	at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
	at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
	at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346)

Since there's nothing for the client to read it will abort if the time required to complete the fsck operation is longer than the client's read timeout setting.

I can think of a couple ways to fix this:

Set an infinite read timeout on the client side (not a good idea!).
Have the server-side write (and flush) zeros to the wire and instruct the client to ignore these characters instead of echoing them.
It's possible that flushing an empty buffer on the server-side will trigger an HTTP response with a zero length payload. This may be enough to keep the client from hanging up.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HDFS-7175.patch
01/Oct/14 08:08
0.9 kB
Akira Ajisaka
HDFS-7175.patch
02/Oct/14 02:21
0.9 kB
Akira Ajisaka
HDFS-7175.3.patch
08/Oct/14 10:47
0.8 kB
Akira Ajisaka
HDFS-7175.2.patch
02/Oct/14 08:18
1.0 kB
Akira Ajisaka
HDFS-7157.004.patch
24/Jan/20 11:57
3 kB
Stephen O'Donnell

Issue Links

is broken by

HDFS-2538 option to disable fsck dots

Resolved

relates to

HDFS-15216 Wrong Use Case of -showprogress in fsck

Resolved

Activity

People

Assignee:: Stephen O'Donnell

Reporter:: Carl Steinbach

Votes:: 0 Vote for this issue

Watchers:: 17 Start watching this issue

Dates

Created:: 01/Oct/14 04:14

Updated:: 12/Mar/20 12:12

Resolved:: 01/Feb/20 00:14