Yes, the scenario you are talking about is possible. During distributed log splitting, the general steps are
1. One node or regionserver crashes
2. Other nodes get the WALs of that node, and for each node, a worker thread will start to split the WALs it received from that crashed node
3. The worker thread will create "recovered.edits" folder under the region whose edits were recorded in the WAL if not exists, for example,
4. Then it will write edits of the WAL to a temp file under the "recovered.edits" folder, like
. In this step, each thread is writing to different file so no conflicts.
5. Finally after it finished reading that WAL file, it will rename it to
for log replay. In this step, it will update the "recovered.edits" folder lastmodifiedtime property. And multiple threads may update the same folder, but this is fixed in
6. Go back to step 3.
Since WALs contain the same range of regions, the worker thread on each node may do step 3~5 at the same time on the same region folder path.
So you think the exception happens when for example the worker thread 1 is doing 5 and the worker thread 2 is doing 3. However, if you look at mkdir in WASB, it will pre-check whether "recovered.edits" has already been created or not. So this case is based on the thread 1 finishes step 3-5 between thread2 passes pre-check and starts to create the empty blob in step 3. This is a very short time but possible.
However, a more possible case is when multiple nodes get the WALs from the crashed node, and some time they all find the first edits in their WAL which belong to the same region, then they try to do step 3, since these edits are the first one belong to that region in each WAL, there is no "recovered.edits" folder under that region. Thus the pre-check will pass for all worker threads. Then all of them are trying to create a empty blob on the same path. The exception happens when calling
. Looking into the code, although WASB does not explicitly acquire lease on this call, but internally since this is a write operation, a lease will be automatically acquired in SDK layer. If worker thread 1 acquires the lease, other worker threads will fail here.
No matter which case, since this mkdir operation is simply trying to create a "recovered.edits" folder or other folders in other scenarios, so if the exception is "There is currently a lease on the blob...", which means the folder/blob has already been created, we just need to return success rather than throwing exception here.