When transitioning a SBN to active, I ran into the following situation:
- the TrashPolicy first gets loaded by an IPC Server Handler thread. The initialize function then tries to make an RPC to the same node to find out the defaults.
- This is happening inside the NN write lock (since it's part of the active initialization). Hence, all of the other handler threads are already blocked waiting to get the NN lock.
- Since no handler threads are free, the RPC blocks forever and the NN never enters active state.
We need to have a general policy that the NN should never make RPCs to itself for any reason, due to potential for deadlocks like this.
|Assignee||Eli Collins [ eli ]|
|Status||Open [ 1 ]||Patch Available [ 10002 ]|
|Affects Version/s||3.0.0 [ 12320356 ]|
|Target Version/s||3.0.0, 2.2.0-alpha [ 12320356, 12322472 ]||2.2.0-alpha [ 12322472 ]|
|Project||Hadoop HDFS [ 12310942 ]||Hadoop Common [ 12310240 ]|
|Affects Version/s||2.2.0-alpha [ 12322473 ]|
|Affects Version/s||2.2.0-alpha [ 12322472 ]|
|Target Version/s||2.2.0-alpha [ 12322472 ]||2.2.0-alpha [ 12322473 ]|
|Component/s||trash [ 12319645 ]|
|Component/s||name-node [ 12312926 ]|
|Status||Patch Available [ 10002 ]||Resolved [ 5 ]|
|Hadoop Flags||Reviewed [ 10343 ]|
|Target Version/s||2.2.0-alpha [ 12322473 ]|
|Fix Version/s||2.2.0-alpha [ 12322473 ]|
|Resolution||Fixed [ 1 ]|
Arun C Murthy made changes -
|Status||Resolved [ 5 ]||Closed [ 6 ]|