[HDFS-5299] DFS client hangs in updatePipeline RPC when failover happened - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Blocker
Resolution: Fixed
Affects Version/s: 2.1.0-beta, 3.0.0-alpha1
Fix Version/s: 2.2.0
Component/s: namenode
Labels:
None

Hadoop Flags:

Reviewed

Description

DFSClient got hanged in updatedPipeline call to namenode when the failover happened at exactly sametime.

When we digged down, issue found to be with handling the RetryCache in updatePipeline.

Here are the steps :
1. Client was writing slowly.
2. One of the datanode was down and updatePipeline was called to ANN.
3. Call reached the ANN, while processing updatePipeline call it got shutdown.
3. Now Client retried (Since the api marked as AtMostOnce) to another NameNode. at that time still NN was in STANDBY and got StandbyException.
4. Now one more time client failover happened.
5. Now SNN became Active.
6. Client called to current ANN again for updatePipeline,

Now client call got hanged in NN, waiting for the cached call with same callid to be over. But this cached call is already got over last time with StandbyException.

Conclusion :
Always whenever the new entry is added to cache we need to update the result of the call before returning the call or throwing exception.
I can see similar issue multiple RPCs in FSNameSystem.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HDFS-5299.000.patch
05/Oct/13 22:19
11 kB
Jing Zhao
HDFS-5299.patch
04/Oct/13 06:05
8 kB
Vinayakumar B

Activity

People

Assignee:: Vinayakumar B

Reporter:: Vinayakumar B

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 04/Oct/13 05:13

Updated:: 12/May/16 18:17

Resolved:: 06/Oct/13 18:49