[HIVE-13513] cleardanglingscratchdir does not work in some version of HDFS - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.3.0, 2.1.0
Component/s: None
Labels:
None

Target Version/s:

1.3.0, 2.1.0
Hadoop Flags:

Reviewed

Description

On some Hadoop version, we keep getting "lease recovery" message at the time we check for scratchdir by opening for appending:

Failed to APPEND_FILE xxx for DFSClient_NONMAPREDUCE_785768631_1 on 10.0.0.18 because lease recovery is in progress. Try again later.
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2917)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:2677)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:2984)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:2953)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:655)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:421)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2137)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2133)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2131)

and

16/04/14 04:51:56 ERROR hdfs.DFSClient: Failed to close inode 18963
java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage[10.0.0.12:30010,DS-b355ac2a-a23a-418a-af9b-4c1b4e26afe8,DISK]], original=[DatanodeInfoWithStorage[10.0.0.12:30010,DS-b355ac2a-a23a-418a-af9b-4c1b4e26afe8,DISK]]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration.
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:951)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:1017)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1165)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:470)

The reason is not clear. However, if we remove hsync from SessionState, everything works as expected. Attach patch to remove hsync call for now.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-13513.1.patch
14/Apr/16 05:59
0.9 kB
Daniel Dai
HIVE-13513.2.patch
16/Apr/16 05:05
3 kB
Daniel Dai

Issue Links

is related to

HIVE-13429 Tool to remove dangling scratch dir

Closed

Activity

People

Assignee:: Daniel Dai

Reporter:: Daniel Dai

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 14/Apr/16 05:58

Updated:: 27/Feb/24 22:24

Resolved:: 25/May/16 22:26