Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-6300

PID1 of docker images does not behave correctly

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 1.2.0, 1.1.4
    • Fix Version/s: 1.3.0, 1.2.2
    • Component/s: Docker
    • Labels:
      None
    • Environment:

      all

      Description

      When running the task manager and job manager docker images the process with PID1 is a bash script.

      There is a problem in using bash as the PID1 process in a docker
      container as docker sends SIGTERM, but bash doesn't send this to its
      child processes.

      This means for example that if a container was ever killed and a child
      process had a file open then the file may get corrupted.

      It's covered in more detail in a blog post here:
      https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem/

      From the mailing list (Nico):
      "Some background:
      Although docker-entrypoint.sh uses "exec" to run succeeding bash scripts for
      jobmanager.sh and taskmanager.sh, respectively, and thus replaces itself with
      these scripts, they do not seem to use exec themselves for foreground
      processes and thus may run into the problem you described.
      I may be wrong, but I did not find any other fallback to handle this in the
      current code base."

      Potentially useful information:

      dockerd version 1.1.3 added an init flag:

      "You can use the --init flag to indicate that an init process should be used as the PID 1 in the container. Specifying an init process ensures the usual responsibilities of an init system, such as reaping zombie processes, are performed inside the created container."
      from:
      https://docs.docker.com/engine/reference/run/#restart-policies---restart

      perhaps the fix could be just to update readme for these images to specify to use this flag.

        Issue Links

          Activity

          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user patricklucas opened a pull request:

          https://github.com/apache/flink/pull/3734

          FLINK-6300 Use 'exec' in start-foreground calls

          To avoid signal-handling issues in Docker, applications need to run as
          a single executable or use a process manager that forwards signals
          correctly, in either case running as PID 1.

          Since Flink uses a number of chained scripts before the ultimate call
          to `java`, we need to use `exec` so that the script executable is
          replaced, ultimately resulting in a single `java` process as PID 1.

          There's no need to run a process manager since Flink only actually
          requires a single process.

          *Note:* As there has not been a Flink release since these codepaths were introduced, I'm comfortable with merging this change for 1.2.1 since my manual testing was successful.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/patricklucas/flink FLINK-6300_docker_pid_1_fix

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/flink/pull/3734.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #3734


          commit 3be49de04c239083a5f050bf891834fc8abc12c9
          Author: Patrick Lucas <me@patricklucas.com>
          Date: 2017-04-18T12:48:49Z

          FLINK-6300 Use 'exec' in start-foreground calls

          To avoid signal-handling issues in Docker, applications need to run as
          a single executable or use a process manager that forwards signals
          correctly, in either case running as PID 1.

          Since Flink uses a number of chained scripts before the ultimate call
          to `java`, we need to use `exec` so that the script executable is
          replaced, ultimately resulting in a single `java` process as PID 1.

          There's no need to run a process manager since Flink only actually
          requires a single process.


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user patricklucas opened a pull request: https://github.com/apache/flink/pull/3734 FLINK-6300 Use 'exec' in start-foreground calls To avoid signal-handling issues in Docker, applications need to run as a single executable or use a process manager that forwards signals correctly, in either case running as PID 1. Since Flink uses a number of chained scripts before the ultimate call to `java`, we need to use `exec` so that the script executable is replaced, ultimately resulting in a single `java` process as PID 1. There's no need to run a process manager since Flink only actually requires a single process. * Note: * As there has not been a Flink release since these codepaths were introduced, I'm comfortable with merging this change for 1.2.1 since my manual testing was successful. You can merge this pull request into a Git repository by running: $ git pull https://github.com/patricklucas/flink FLINK-6300 _docker_pid_1_fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3734.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3734 commit 3be49de04c239083a5f050bf891834fc8abc12c9 Author: Patrick Lucas <me@patricklucas.com> Date: 2017-04-18T12:48:49Z FLINK-6300 Use 'exec' in start-foreground calls To avoid signal-handling issues in Docker, applications need to run as a single executable or use a process manager that forwards signals correctly, in either case running as PID 1. Since Flink uses a number of chained scripts before the ultimate call to `java`, we need to use `exec` so that the script executable is replaced, ultimately resulting in a single `java` process as PID 1. There's no need to run a process manager since Flink only actually requires a single process.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user patricklucas commented on the issue:

          https://github.com/apache/flink/pull/3734

          cc @greghogan, fyi

          Show
          githubbot ASF GitHub Bot added a comment - Github user patricklucas commented on the issue: https://github.com/apache/flink/pull/3734 cc @greghogan, fyi
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user greghogan commented on the issue:

          https://github.com/apache/flink/pull/3734

          +1

          Show
          githubbot ASF GitHub Bot added a comment - Github user greghogan commented on the issue: https://github.com/apache/flink/pull/3734 +1
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user NicoK commented on the issue:

          https://github.com/apache/flink/pull/3734

          +1 (seems to work nicely, as expected)

          Show
          githubbot ASF GitHub Bot added a comment - Github user NicoK commented on the issue: https://github.com/apache/flink/pull/3734 +1 (seems to work nicely, as expected)
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user patricklucas commented on the issue:

          https://github.com/apache/flink/pull/3734

          @greghogan do you think you could merge this? (also to 1.2.1)

          If not I can try to rope in someone else.

          Show
          githubbot ASF GitHub Bot added a comment - Github user patricklucas commented on the issue: https://github.com/apache/flink/pull/3734 @greghogan do you think you could merge this? (also to 1.2.1) If not I can try to rope in someone else.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user greghogan commented on the issue:

          https://github.com/apache/flink/pull/3734

          @patricklucas yes, had just come back to say that

          Show
          githubbot ASF GitHub Bot added a comment - Github user greghogan commented on the issue: https://github.com/apache/flink/pull/3734 @patricklucas yes, had just come back to say that
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user patricklucas commented on the issue:

          https://github.com/apache/flink/pull/3734

          Great, thanks!

          Show
          githubbot ASF GitHub Bot added a comment - Github user patricklucas commented on the issue: https://github.com/apache/flink/pull/3734 Great, thanks!
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/flink/pull/3734

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/3734
          Hide
          greghogan Greg Hogan added a comment -

          1.3.0: 9d3f127e4f0ddd15ae7af393e0c5b0d3dd81d812
          1.2.2: 4a1aa4ce80fd886266ee7ba76caf27feba3c0f43

          Show
          greghogan Greg Hogan added a comment - 1.3.0: 9d3f127e4f0ddd15ae7af393e0c5b0d3dd81d812 1.2.2: 4a1aa4ce80fd886266ee7ba76caf27feba3c0f43

            People

            • Assignee:
              plucas Patrick Lucas
              Reporter:
              ksharp kathleen sharp
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development