Uploaded image for project: 'Samza'
  1. Samza
  2. SAMZA-616

Shutdown hook does not wait for container to finish

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.9.0
    • Fix Version/s: 0.10.0, 0.9.1
    • Component/s: container
    • Labels:
      None

      Description

      SAMZA-506 introduced a shutdown hook to set shutdownNow to true in Samza's RunLoop. Unfortunately, this hook returns immediately after shutdowNow=true is set. As soon as the shutdown hook returns, the process dies. We should have the shutdown hook block (with some timeout) until the container fully shuts itself down.

      1. 0001-SAMZA-616.patch
        13 kB
        Tommy Becker
      2. SAMZA-616.0.9.1.patch
        13 kB
        Yi Pan (Data Infrastructure)

        Activity

        Hide
        twbecker Tommy Becker added a comment -

        Chris Riccomini any thoughts regarding the timeout? Does it need to be configurable?

        Show
        twbecker Tommy Becker added a comment - Chris Riccomini any thoughts regarding the timeout? Does it need to be configurable?
        Hide
        criccomini Chris Riccomini added a comment -

        According to yarn-default.xml, there is this setting:

        yarn.nodemanager.sleep-delay-before-sigkill.ms	250	No. of ms to wait between sending a SIGTERM and SIGKILL to a container
        

        It looks like YARN's timeout is pretty aggressive. 250ms isn't much time. I'd say we should make it configurable, since different systems might have different shutdown times. Kafka consumer/producer can shutdown pretty quickly. Others might not be able to. My first instinct is:

        1. Make it configurable.
        2. Default to something higher than 250ms, even though that's the YARN default. Maybe 5s?
        Show
        criccomini Chris Riccomini added a comment - According to yarn-default.xml, there is this setting: yarn.nodemanager.sleep-delay-before-sigkill.ms 250 No. of ms to wait between sending a SIGTERM and SIGKILL to a container It looks like YARN's timeout is pretty aggressive. 250ms isn't much time. I'd say we should make it configurable, since different systems might have different shutdown times. Kafka consumer/producer can shutdown pretty quickly. Others might not be able to. My first instinct is: Make it configurable. Default to something higher than 250ms, even though that's the YARN default. Maybe 5s?
        Hide
        twbecker Tommy Becker added a comment -

        Patch attached. Had to remove the unit test but that seemed acceptable given that it simplified the code.

        Show
        twbecker Tommy Becker added a comment - Patch attached. Had to remove the unit test but that seemed acceptable given that it simplified the code.
        Hide
        criccomini Chris Riccomini added a comment -
        1. Any reason why there are so many changes in the configuration table? Seems like some white space went screwy.
        2. Do we still need removeShutdownHook? Noticed that it's been removed.
        3. Have you tested this code at all?
        Show
        criccomini Chris Riccomini added a comment - Any reason why there are so many changes in the configuration table? Seems like some white space went screwy. Do we still need removeShutdownHook? Noticed that it's been removed. Have you tested this code at all?
        Hide
        twbecker Tommy Becker added a comment -

        Yeah, I mentioned the whitespace in the RB review but forgot to link it here: https://reviews.apache.org/r/32877/. Apparently my IDE cleaned up the trailing spaces and I didn't notice it until I posted the patch. Can resubmit if it's a problem. With regard to #2, as far as I'm aware there is no need to remove a shutdown hook unless you decide you don't want it to run. And yes, I have tested it.

        Show
        twbecker Tommy Becker added a comment - Yeah, I mentioned the whitespace in the RB review but forgot to link it here: https://reviews.apache.org/r/32877/ . Apparently my IDE cleaned up the trailing spaces and I didn't notice it until I posted the patch. Can resubmit if it's a problem. With regard to #2, as far as I'm aware there is no need to remove a shutdown hook unless you decide you don't want it to run. And yes, I have tested it.
        Hide
        criccomini Chris Riccomini added a comment -

        +1 Merged and committed. Thanks!

        Show
        criccomini Chris Riccomini added a comment - +1 Merged and committed. Thanks!
        Hide
        nickpan47 Yi Pan (Data Infrastructure) added a comment -

        Re-open to back port to 0.9.1

        Show
        nickpan47 Yi Pan (Data Infrastructure) added a comment - Re-open to back port to 0.9.1
        Hide
        jghoman Jakob Homan added a comment -

        +1 on backport.

        Show
        jghoman Jakob Homan added a comment - +1 on backport.
        Hide
        nickpan47 Yi Pan (Data Infrastructure) added a comment -

        Merged and committed to 0.9.1. Thanks!

        Show
        nickpan47 Yi Pan (Data Infrastructure) added a comment - Merged and committed to 0.9.1. Thanks!

          People

          • Assignee:
            twbecker Tommy Becker
            Reporter:
            criccomini Chris Riccomini
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development