HDFS-4477 specifically did not acquire the fsn lock during token cancellation based on the belief that edit logs are thread-safe. However, log rolling is not thread-safe. Failure to externally synchronize on the fsn lock during a roll will cause problems.
For sync edit logging, it may cause corruption by interspersing edits with the end/start segment edits. Async edit logging may encounter a deadlock if the log queue overflows. Luckily, losing the race is extremely rare. In ~5 years, we've never encountered it. However, HDFS-13051 lost the race with async edits.
- is caused by
-
HDFS-4477 Secondary namenode may retain old tokens
-
- Closed
-
- is related to
-
HDFS-13051 Fix dead lock during async editlog rolling if edit queue is full
-
- Resolved
-
- requires
-
HADOOP-15212 Add independent secret manager method for logging expired tokens
-
- Resolved
-