Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
Carter Shanklin pointed out to me that he got into a case where sending a SIGTERM to the Phoenix QueryServer resulted in it not exiting. I've been able to reproduce this.
1. Start HBase and PQS
2. Stop HBase master
3. Try to run a query through PQS
4. kill -15 <pqs_pid>
At this point, the thread from #3 is still running in PQS, trying to connect to HBase (following the normal HBase retry policy which will retry for order-minutes). The ShutdownHook, run as an attempt to cleanup nicely, gets blocked trying to close the instance because the read lock is still held by the step 3 query. The outward effect is that PQS stays up and running until HBase becomes available or the HBase retries time out because the JVM will stay running until all shutdown hooks return.
While the system will eventually fix itself, it's a bit awkward to send SIGTERM to a process and not have it die within a few seconds. The code around the shutdown hook registration certainly seems like blocking is unintentional too.
A simple fix is to wrap the PhoenixDriver closing in a timeout so that we don't rely on the HBase timeout to exit the JVM.