While working on PR #382 for
ACCUMULO-4782 I noticed a significant concurrency bug. Before #382 their was a single lock for the session manager. The session manager will clean up idle sessions. This clean up should happen outside the session manager lock, because all tserver read/write operation use the session manger so it should be responsive.
The bug is the following.
- Both getActiveScansPerTable() and getActiveScans() lock the session manager and then lock idleSessions. See SessionManager line 233
- The sweep() method locks idleSessions and does cleanup while this lock is held. See SessionManager 200
Therefore it is possible for getActiveScansPerTable() or getActiveScans() to lock the session manager and then block trying to lock idleSessions while cleanup is happening in sweep(). This will block all access to the session manager while cleanup happens.
The changes in #382 will fix this for 1.9.0 and 2.0.0. However I Am not sure about backporting #382 to 1.7. A more targeted fix could be made for 1.7 or #382 could be backported.