Details
-
Sub-task
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
3.3.0
Description
I find there is a critical problem on RBF, HDFS-15078 can resolve it on some Scenarios, but i have no idea about the overall resolution.
The problem is that
Client with RBF(r0, r1) create a file HDFS file via r0, it gets Exception and failovers to r1
r0 has been send create rpc to namenode(1st create)
Client create a HDFS file via r1(2nd create)
Client writes the HDFS file and close it finally(3rd close)
Maybe namenode receiving the rpc in order as follow
2nd create
3rd close
1st create
And overwrite is true by default, this would make the file had been written an empty file. This is an critical problem
We had encountered this problem. There are many hive and spark jobs running on our cluster, sometimes it occurs
Attachments
Attachments
Issue Links
- causes
-
HDFS-16738 Invalid CallerContext caused NullPointerException
- Resolved
- fixes
-
HDFS-17268 when SocketTimeoutException happen, overwrite mode can delete old data, and make file empty
- Resolved
- is related to
-
HDFS-16756 RBF proxies the client's user by the login user to enable CacheEntry
- Resolved
- relates to
-
HDFS-15310 RBF: Not proxy client's clientId and callId caused RetryCache invalid in NameNode.
- Resolved
-
HDFS-16388 The namenode should check the connection channel before the RPC returns data
- Open
-
HDFS-15078 RBF: Should check connection channel before sending RPC to namenode
- Patch Available
- links to