Uploaded image for project: 'Apache Storm'
  1. Apache Storm
  2. STORM-1879

Supervisor may not shut down workers cleanly

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.0.1
    • 2.0.0, 1.0.2, 1.1.0
    • storm-core
    • None

    Description

      We've run into a strange issue with a zombie worker process. It looks like the worker pid file somehow got deleted without the worker process shutting down. This causes the supervisor to try repeatedly to kill the worker unsuccessfully, and means multiple workers may be assigned to the same port. The worker root folder sticks around because the worker is still heartbeating to it.

      It may or may not be related that we've seen Nimbus occasionally enter an infinite loop of printing logs similar to the below.

      2016-05-19 14:55:14.196 o.a.s.b.BlobStoreUtils [ERROR] Could not update the blob with keyZendeskTicketTopology-5-1463647641-stormconf.ser
      2016-05-19 14:55:14.210 o.a.s.b.BlobStoreUtils [ERROR] Could not update the blob with keyZendeskTicketTopology-5-1463647641-stormcode.ser
      2016-05-19 14:55:14.218 o.a.s.b.BlobStoreUtils [ERROR] Could not update the blob with keyZendeskTicketTopology-5-1463647641-stormconf.ser
      2016-05-19 14:55:14.256 o.a.s.b.BlobStoreUtils [ERROR] Could not update the blob with keyZendeskTicketTopology-5-1463647641-stormcode.ser
      2016-05-19 14:55:14.273 o.a.s.b.BlobStoreUtils [ERROR] Could not update the blob with keyZendeskTicketTopology-5-1463647641-stormcode.ser
      2016-05-19 14:55:14.316 o.a.s.b.BlobStoreUtils [ERROR] Could not update the blob with keyZendeskTicketTopology-5-1463647641-stormconf.ser
      

      Which continues until Nimbus is rebooted. We also see repeating blocks similar to the logs below.

      2016-06-02 07:45:03.656 o.a.s.d.nimbus [INFO] Cleaning up ZendeskTicketTopology-127-1464780171
      2016-06-02 07:45:04.132 o.a.s.d.nimbus [INFO] ExceptionKeyNotFoundException(msg:ZendeskTicketTopology-127-1464780171-stormjar.jar)
      2016-06-02 07:45:04.144 o.a.s.d.nimbus [INFO] ExceptionKeyNotFoundException(msg:ZendeskTicketTopology-127-1464780171-stormconf.ser)
      2016-06-02 07:45:04.155 o.a.s.d.nimbus [INFO] ExceptionKeyNotFoundException(msg:ZendeskTicketTopology-127-1464780171-stormcode.ser)
      

      Attachments

        1. nimbus-supervisor.zip
          6.42 MB
          Stig Rohde Døssing
        2. supervisor.log
          1.10 MB
          Nico Meyer
        3. fix_missing_worker_pid.patch
          1 kB
          Nico Meyer

        Issue Links

          Activity

            People

              kabhwan Jungtaek Lim
              srdo Stig Rohde Døssing
              Votes:
              5 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: