Details
-
Sub-task
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
When SPSPathIdProcessor thread call getNextSPSPath(), it get the pathId from namenode and namenode will also remove this pathId from pathsToBeTraveresed queue.
public Long getNextPathId() { synchronized (pathsToBeTraveresed) { return pathsToBeTraveresed.poll(); } }
If SPS process restart, this path will not continue the move operation until namenode restart.
So we want to provide a way for the SPS to continue performing the move operation after SPS restart.
First solution:
1) When SPSPathIdProcessor thread call getNextSPSPath(), namenode return pathId and then move this pathId to a pathsBeingTraveresed queue;
2) After SPS finish a path movement operation, it call a rpc to namenode to remove this pathId from pathsBeingTraveresed queue;
3) If SPS restart, SPSPathIdProcessor thread should call a rpc to namenode to get all pathId from pathsBeingTraveresed queue;
Second solution:
We added timeout detection in the application layer, if a path does not complete the movement within the specified time, we can re-satisfy this path even though it has "hdfs.sps" xattr already.
We choose the second solution because the first solution will add more rpc operation and may affect namenode performance.
Attachments
Issue Links
- links to