Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.0.1
-
None
Description
We've run into a strange issue with a zombie worker process. It looks like the worker pid file somehow got deleted without the worker process shutting down. This causes the supervisor to try repeatedly to kill the worker unsuccessfully, and means multiple workers may be assigned to the same port. The worker root folder sticks around because the worker is still heartbeating to it.
It may or may not be related that we've seen Nimbus occasionally enter an infinite loop of printing logs similar to the below.
2016-05-19 14:55:14.196 o.a.s.b.BlobStoreUtils [ERROR] Could not update the blob with keyZendeskTicketTopology-5-1463647641-stormconf.ser 2016-05-19 14:55:14.210 o.a.s.b.BlobStoreUtils [ERROR] Could not update the blob with keyZendeskTicketTopology-5-1463647641-stormcode.ser 2016-05-19 14:55:14.218 o.a.s.b.BlobStoreUtils [ERROR] Could not update the blob with keyZendeskTicketTopology-5-1463647641-stormconf.ser 2016-05-19 14:55:14.256 o.a.s.b.BlobStoreUtils [ERROR] Could not update the blob with keyZendeskTicketTopology-5-1463647641-stormcode.ser 2016-05-19 14:55:14.273 o.a.s.b.BlobStoreUtils [ERROR] Could not update the blob with keyZendeskTicketTopology-5-1463647641-stormcode.ser 2016-05-19 14:55:14.316 o.a.s.b.BlobStoreUtils [ERROR] Could not update the blob with keyZendeskTicketTopology-5-1463647641-stormconf.ser
Which continues until Nimbus is rebooted. We also see repeating blocks similar to the logs below.
2016-06-02 07:45:03.656 o.a.s.d.nimbus [INFO] Cleaning up ZendeskTicketTopology-127-1464780171 2016-06-02 07:45:04.132 o.a.s.d.nimbus [INFO] ExceptionKeyNotFoundException(msg:ZendeskTicketTopology-127-1464780171-stormjar.jar) 2016-06-02 07:45:04.144 o.a.s.d.nimbus [INFO] ExceptionKeyNotFoundException(msg:ZendeskTicketTopology-127-1464780171-stormconf.ser) 2016-06-02 07:45:04.155 o.a.s.d.nimbus [INFO] ExceptionKeyNotFoundException(msg:ZendeskTicketTopology-127-1464780171-stormcode.ser)
Attachments
Attachments
Issue Links
- relates to
-
STORM-1934 Race condition between sync-supervisor and sync-processes raises several strange issues
- Resolved