Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-4326

Flink start-up scripts should optionally start services on the foreground

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.0.3
    • Fix Version/s: 1.3.0, 1.2.1
    • Component/s: Startup Shell Scripts
    • Labels:
      None

      Description

      This has previously been mentioned in the mailing list, but has not been addressed. Flink start-up scripts start the job and task managers in the background. This makes it difficult to integrate Flink with most processes supervisory tools and init systems, including Docker. One can get around this via hacking the scripts or manually starting the right classes via Java, but it is a brittle solution.

      In addition to starting the daemons in the foreground, the start up scripts should use exec instead of running the commends, so as to avoid forks. Many supervisory tools assume the PID of the process to be monitored is that of the process it first executes, and fork chains make it difficult for the supervisor to figure out what process to monitor. Specifically, jobmanager.sh and taskmanager.sh should exec flink-daemon.sh, and flink-daemon.sh should exec java.

        Issue Links

          Activity

          Hide
          aljoscha Aljoscha Krettek added a comment -

          Ismaël Mejía I think you would also be interested in this.

          Show
          aljoscha Aljoscha Krettek added a comment - Ismaël Mejía I think you would also be interested in this.
          Hide
          iemejia Ismaël Mejía added a comment -

          For ref this is the commit of my proposed solution:
          https://github.com/iemejia/flink/commit/af9ac6cdb3f6601d6248abe82df4fd44de4453e5

          Notice that I finally closed the pull request since we could hack this via the && wait for the docker script that was my purpose, but I still agree that having foreground processes has its value. For ref, the JIRA where we discussed this early on:
          https://issues.apache.org/jira/browse/FLINK-4208

          Is there an alternative PR for this ?

          Show
          iemejia Ismaël Mejía added a comment - For ref this is the commit of my proposed solution: https://github.com/iemejia/flink/commit/af9ac6cdb3f6601d6248abe82df4fd44de4453e5 Notice that I finally closed the pull request since we could hack this via the && wait for the docker script that was my purpose, but I still agree that having foreground processes has its value. For ref, the JIRA where we discussed this early on: https://issues.apache.org/jira/browse/FLINK-4208 Is there an alternative PR for this ?
          Hide
          mtanski Milosz Tanski added a comment -

          This would also be valuable for people trying to run on linux systemd to run the service (which is pretty much all new linux distros). And that way the users can use system policies for logging (since systemd automatically collects stdout).

          Show
          mtanski Milosz Tanski added a comment - This would also be valuable for people trying to run on linux systemd to run the service (which is pretty much all new linux distros). And that way the users can use system policies for logging (since systemd automatically collects stdout).
          Hide
          elevy Elias Levy added a comment -

          Not just systemd, but also upstart (my particular case, within an Amazon Linux AMI), deamontools, daemon, and many other supervisory tools.

          Show
          elevy Elias Levy added a comment - Not just systemd, but also upstart (my particular case, within an Amazon Linux AMI), deamontools, daemon, and many other supervisory tools.
          Hide
          iemejia Ismaël Mejía added a comment -

          Well it is good to know that there is interest around the foreground mode, probably it is a good idea to invite Greg Hogan to the discussion since he reviewed my previous PR.

          What do you think ? should I rebase my previous patch and create a PR for this, or any of you guys has a better idea of how to do it ?

          Show
          iemejia Ismaël Mejía added a comment - Well it is good to know that there is interest around the foreground mode, probably it is a good idea to invite Greg Hogan to the discussion since he reviewed my previous PR. What do you think ? should I rebase my previous patch and create a PR for this, or any of you guys has a better idea of how to do it ?
          Hide
          greghogan Greg Hogan added a comment -

          Using exec solves the problem of not knowing the PID until the daemon has launched but doesn't allow for removing the PID after termination. What level of monitoring is performed by the supervisor? Is this simply a "is this process still alive" or more complicated like tracking cpu and memory usage?

          Show
          greghogan Greg Hogan added a comment - Using exec solves the problem of not knowing the PID until the daemon has launched but doesn't allow for removing the PID after termination. What level of monitoring is performed by the supervisor? Is this simply a "is this process still alive" or more complicated like tracking cpu and memory usage?
          Hide
          mtanski Milosz Tanski added a comment -

          Greg Hogan A lot of the traditional things UNIX services have been doing in the past like (daemonizing, logging and rotation) are being taken over by the system via init systems, supervisors, containers.

          The systemd supervisor uses similar Linux features as docker, it can create a new cgroup of processes for a daemon. This way it's able to monitor the whole group and shutdown the whole group. It can apply things like CPU slice and memory limits. And it takes over logging managment (provided the service spews to stdout/err) and from there the system admin can manage all logs in one place on the box or ship them to a logging aggregation service

          Show
          mtanski Milosz Tanski added a comment - Greg Hogan A lot of the traditional things UNIX services have been doing in the past like (daemonizing, logging and rotation) are being taken over by the system via init systems, supervisors, containers. The systemd supervisor uses similar Linux features as docker, it can create a new cgroup of processes for a daemon. This way it's able to monitor the whole group and shutdown the whole group. It can apply things like CPU slice and memory limits. And it takes over logging managment (provided the service spews to stdout/err) and from there the system admin can manage all logs in one place on the box or ship them to a logging aggregation service
          Hide
          greghogan Greg Hogan added a comment -

          The configuration parsing in jobmanager.sh, taskmanager.sh, etc. will be useful for starting processes in foreground mode. Would it be better to leave flink-daemon.sh unchanged and create a new lightweight foreground-only start script?

          Show
          greghogan Greg Hogan added a comment - The configuration parsing in jobmanager.sh , taskmanager.sh , etc. will be useful for starting processes in foreground mode. Would it be better to leave flink-daemon.sh unchanged and create a new lightweight foreground-only start script?
          Hide
          iemejia Ismaël Mejía added a comment -

          Well that's the question I was wondering before my previous PR but then I realized that having a centralized point for all the changes was less error-prone (current flink-daemon.sh), that's the reason I ended up mixing flink-daemon with an action like 'start-foreground', on the other hand we can rename flink-daemon into flink-service and it will make the same but it will have a less confusing naming.

          Show
          iemejia Ismaël Mejía added a comment - Well that's the question I was wondering before my previous PR but then I realized that having a centralized point for all the changes was less error-prone (current flink-daemon.sh), that's the reason I ended up mixing flink-daemon with an action like 'start-foreground', on the other hand we can rename flink-daemon into flink-service and it will make the same but it will have a less confusing naming.
          Hide
          greghogan Greg Hogan added a comment -

          I don't see much overlap between flink-daemon.sh and a flink-console.sh. The latter will not be parsing or appending to the PID file, redirecting output, or stopping.

          Show
          greghogan Greg Hogan added a comment - I don't see much overlap between flink-daemon.sh and a flink-console.sh . The latter will not be parsing or appending to the PID file, redirecting output, or stopping.
          Hide
          iemejia Ismaël Mejía added a comment -

          A separation of (daemon/console) scripts would be the nicest, no doubt. However, I am not sure if removing the PID code + output will be appropriate when we run daemons and foreground processes at the same time, how do we count the running instances if somebody runs a new process in foreground mode, or what would be the logic if we call stop-all, must we kill all the processes even the foreground ones ? in these cases I think we need the PID/output refs, but well I am not really sure and maybe we can do such things without it.

          Independent of this we must also not forget that we should preserve at least the same options (start|stop|stop-all) for both jobmanager.sh and taskmanager. because they do their magic (build the runtime options) and at the end they call the the (daemon/console) script. I suppose we will need the new start-foreground option in these scripts too, or are there any other ideas of how to do it best ?

          Show
          iemejia Ismaël Mejía added a comment - A separation of (daemon/console) scripts would be the nicest, no doubt. However, I am not sure if removing the PID code + output will be appropriate when we run daemons and foreground processes at the same time, how do we count the running instances if somebody runs a new process in foreground mode, or what would be the logic if we call stop-all, must we kill all the processes even the foreground ones ? in these cases I think we need the PID/output refs, but well I am not really sure and maybe we can do such things without it. Independent of this we must also not forget that we should preserve at least the same options (start|stop|stop-all) for both jobmanager.sh and taskmanager. because they do their magic (build the runtime options) and at the end they call the the (daemon/console) script. I suppose we will need the new start-foreground option in these scripts too, or are there any other ideas of how to do it best ?
          Hide
          greghogan Greg Hogan added a comment -

          Isn't the expectation that the supervisor would monitor and control process started in foreground mode?

          I don't have an alternative to start-foreground to suggest.

          Show
          greghogan Greg Hogan added a comment - Isn't the expectation that the supervisor would monitor and control process started in foreground mode? I don't have an alternative to start-foreground to suggest.
          Hide
          elevy Elias Levy added a comment -

          That would be my expectation. Writing out the PID would be nice, but it is not a requirement. Process management becomes the job of the supervisor process.

          Log handling is different. Some supervisors can handle logs output to stdout, some don't. So even when outputting logs to stdout, the process should still be capable of outputting to other locations, like a log directory or to syslog.

          Show
          elevy Elias Levy added a comment - That would be my expectation. Writing out the PID would be nice, but it is not a requirement. Process management becomes the job of the supervisor process. Log handling is different. Some supervisors can handle logs output to stdout, some don't. So even when outputting logs to stdout, the process should still be capable of outputting to other locations, like a log directory or to syslog.
          Hide
          greghogan Greg Hogan added a comment -

          Logging is configured through log4j / logback.

          Show
          greghogan Greg Hogan added a comment - Logging is configured through log4j / logback.
          Hide
          iemejia Ismaël Mejía added a comment -

          Just to make some progress here, I see three options:

          1. I reopen my PR and we let it like it was (it already works and makes the intended goal but has the weirdness of calling flink-daemon.sh start-foreground ...).
          2. I reopen my PR but rename flink-daemon to flink-service to avoid the mix of daemon+foreground (the negative part of this is that it may break some scripts for people using flink-daemon directly).
          3. We create the additional flink-console script with the issue of copy pasting a small chunk of code at the beginning and keeping both in the future (well we can create some sort of common bash code library, but remember bash is not the nicest language for reuse).

          I prefer option 1 because it reuses all the code and it is already done and tested. and I probably could do option 2 or 3 if most people agree on one of those, but I will still need help from your part for the new review + some testing (in particular for case 3).

          WDYT or do you have other suggestions ?

          Show
          iemejia Ismaël Mejía added a comment - Just to make some progress here, I see three options: 1. I reopen my PR and we let it like it was (it already works and makes the intended goal but has the weirdness of calling flink-daemon.sh start-foreground ...). 2. I reopen my PR but rename flink-daemon to flink-service to avoid the mix of daemon+foreground (the negative part of this is that it may break some scripts for people using flink-daemon directly). 3. We create the additional flink-console script with the issue of copy pasting a small chunk of code at the beginning and keeping both in the future (well we can create some sort of common bash code library, but remember bash is not the nicest language for reuse). I prefer option 1 because it reuses all the code and it is already done and tested. and I probably could do option 2 or 3 if most people agree on one of those, but I will still need help from your part for the new review + some testing (in particular for case 3). WDYT or do you have other suggestions ?
          Hide
          greghogan Greg Hogan added a comment -

          Option 2 won't be allowed. We cannot rename the existing script.

          I prefer the separate script (option 3) because I don't see much overlap (no log files, no locking, no stop or stop-all).

          Show
          greghogan Greg Hogan added a comment - Option 2 won't be allowed. We cannot rename the existing script. I prefer the separate script (option 3) because I don't see much overlap (no log files, no locking, no stop or stop-all).
          Hide
          mxm Maximilian Michels added a comment -

          Maybe I'm missing the point here but wouldn't it suffice to have a small change in flink-daemon to avoid forking the process in the background? Apart from introducing an option for that, nothing else should be changed.

          Show
          mxm Maximilian Michels added a comment - Maybe I'm missing the point here but wouldn't it suffice to have a small change in flink-daemon to avoid forking the process in the background? Apart from introducing an option for that, nothing else should be changed.
          Hide
          greghogan Greg Hogan added a comment -

          There is a fair bit of code to handle the logging which will not be used when starting a console app. I have not yet tested my implementation with a new script:
          https://github.com/greghogan/flink/commit/82091fb07f18606fcee8370abe600ffc9f88817b

          Show
          greghogan Greg Hogan added a comment - There is a fair bit of code to handle the logging which will not be used when starting a console app. I have not yet tested my implementation with a new script: https://github.com/greghogan/flink/commit/82091fb07f18606fcee8370abe600ffc9f88817b
          Hide
          StephanEwen Stephan Ewen added a comment -

          Ismaël Mejía Elias Levy What are your thoughts on the approach suggested by Greg Hogan here?

          Jamie Grier Would this also solve your case of "not redirecting stdout" for docker containers?

          Show
          StephanEwen Stephan Ewen added a comment - Ismaël Mejía Elias Levy What are your thoughts on the approach suggested by Greg Hogan here? Jamie Grier Would this also solve your case of "not redirecting stdout" for docker containers?
          Hide
          elevy Elias Levy added a comment -

          Greg's proposal for flink-console.sh will do. The one thing I would change is putting exec in front of $JAVA_RUN in flink-console.sh.

          Show
          elevy Elias Levy added a comment - Greg's proposal for flink-console.sh will do. The one thing I would change is putting exec in front of $JAVA_RUN in flink-console.sh .
          Hide
          jgrier Jamie Grier added a comment -

          Stephan Ewen Yup this will also solve the issue I raised in FLINK-5634 since my primary concern was running Docker containers and having the option to configure Flink to log to stdout.

          Show
          jgrier Jamie Grier added a comment - Stephan Ewen Yup this will also solve the issue I raised in FLINK-5634 since my primary concern was running Docker containers and having the option to configure Flink to log to stdout.
          Hide
          iemejia Ismaël Mejía added a comment -

          Stephan Ewen Sorry for the late answer, I missed the notification somehow, I agree with Elias, Greg's changes should make it.
          Greg Hogan Can you do this as a PR, I can help you test it if you want (and I will fix the docker image).

          Show
          iemejia Ismaël Mejía added a comment - Stephan Ewen Sorry for the late answer, I missed the notification somehow, I agree with Elias, Greg's changes should make it. Greg Hogan Can you do this as a PR, I can help you test it if you want (and I will fix the docker image).
          Hide
          iemejia Ismaël Mejía added a comment -

          Just for reference of the origin of some previous discussion around this.

          Show
          iemejia Ismaël Mejía added a comment - Just for reference of the origin of some previous discussion around this.
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user greghogan opened a pull request:

          https://github.com/apache/flink/pull/3351

          FLINK-4326 [scripts] Flink foreground services

          Add a "start-foreground" option to the Flink service scripts which does not daemonize the service nor redirect output.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/greghogan/flink 4326_flink_startup_scripts_should_optionally_start_services_on_the_foreground

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/flink/pull/3351.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #3351


          commit c9a84fb189ef8342f3673137c2137bcb3fb54df7
          Author: Greg Hogan <code@greghogan.com>
          Date: 2016-10-07T20:06:48Z

          FLINK-4326 [scripts] Flink foreground services

          Add a "start-foreground" option to the Flink service scripts which does
          not daemonize the service nor redirect output.


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user greghogan opened a pull request: https://github.com/apache/flink/pull/3351 FLINK-4326 [scripts] Flink foreground services Add a "start-foreground" option to the Flink service scripts which does not daemonize the service nor redirect output. You can merge this pull request into a Git repository by running: $ git pull https://github.com/greghogan/flink 4326_flink_startup_scripts_should_optionally_start_services_on_the_foreground Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3351.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3351 commit c9a84fb189ef8342f3673137c2137bcb3fb54df7 Author: Greg Hogan <code@greghogan.com> Date: 2016-10-07T20:06:48Z FLINK-4326 [scripts] Flink foreground services Add a "start-foreground" option to the Flink service scripts which does not daemonize the service nor redirect output.
          Hide
          michim Michi Mutsuzaki added a comment -

          flink-console.sh doesn't define log_setting, and i'm getting a warning like this:

          % ./bin/flink-console.sh jobmanager --configDir conf --executionMode cluster
          Starting jobmanager as a console application on host x1.
          log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
          log4j:WARN Please initialize the log4j system properly.
          log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
          

          maybe it makes sense to provide log4j/logback config files that log to console, and define log_setting to point to these config files?

          Show
          michim Michi Mutsuzaki added a comment - flink-console.sh doesn't define log_setting, and i'm getting a warning like this: % ./bin/flink-console.sh jobmanager --configDir conf --executionMode cluster Starting jobmanager as a console application on host x1. log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. maybe it makes sense to provide log4j/logback config files that log to console, and define log_setting to point to these config files?
          Hide
          greghogan Greg Hogan added a comment -

          Michi Mutsuzaki thank you for reporting the issue. This should now be fixed.

          Show
          greghogan Greg Hogan added a comment - Michi Mutsuzaki thank you for reporting the issue. This should now be fixed.
          Hide
          michim Michi Mutsuzaki added a comment -

          thanks Greg Hogan, lgtm!

          Show
          michim Michi Mutsuzaki added a comment - thanks Greg Hogan , lgtm!
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user patricklucas commented on the issue:

          https://github.com/apache/flink/pull/3351

          I fixed the merge conflicts and some indentation here: https://github.com/patricklucas/flink/tree/FLINK-4326_start_in_foreground

          If you update your PR with those changes, hopefully we can get this pushed soon to unblock some of the Docker-related improvements.

          Show
          githubbot ASF GitHub Bot added a comment - Github user patricklucas commented on the issue: https://github.com/apache/flink/pull/3351 I fixed the merge conflicts and some indentation here: https://github.com/patricklucas/flink/tree/FLINK-4326_start_in_foreground If you update your PR with those changes, hopefully we can get this pushed soon to unblock some of the Docker-related improvements.
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user patricklucas opened a pull request:

          https://github.com/apache/flink/pull/3492

          FLINK-4326 [scripts] Flink foreground services

          This PR consists of @greghogan's commits from #3351 with the merge conflicts fixed.

          If merged, this PR supplants #3351, which can be closed.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/patricklucas/flink FLINK-4326_start_in_foreground

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/flink/pull/3492.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #3492


          commit 0867c868508183e2c5b29f8f695fd13dd9f07ea3
          Author: Greg Hogan <code@greghogan.com>
          Date: 2016-10-07T20:06:48Z

          FLINK-4326 [scripts] Flink foreground services

          Add a "start-foreground" option to the Flink service scripts which does
          not daemonize the service nor redirect output.

          commit 41ab9b7e905d6bf7fbf647808f54888923b3257f
          Author: Greg Hogan <code@greghogan.com>
          Date: 2017-02-21T17:37:04Z

          Add logging configuration


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user patricklucas opened a pull request: https://github.com/apache/flink/pull/3492 FLINK-4326 [scripts] Flink foreground services This PR consists of @greghogan's commits from #3351 with the merge conflicts fixed. If merged, this PR supplants #3351, which can be closed. You can merge this pull request into a Git repository by running: $ git pull https://github.com/patricklucas/flink FLINK-4326 _start_in_foreground Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3492.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3492 commit 0867c868508183e2c5b29f8f695fd13dd9f07ea3 Author: Greg Hogan <code@greghogan.com> Date: 2016-10-07T20:06:48Z FLINK-4326 [scripts] Flink foreground services Add a "start-foreground" option to the Flink service scripts which does not daemonize the service nor redirect output. commit 41ab9b7e905d6bf7fbf647808f54888923b3257f Author: Greg Hogan <code@greghogan.com> Date: 2017-02-21T17:37:04Z Add logging configuration
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user uce commented on the issue:

          https://github.com/apache/flink/pull/3492

          Thanks for this PR and the fixes on top of it Patrick. If there are no objections, I would like to merge this. Since the commits are still attributed to Greg, I think it's fine to merge this instead of #3351.

          Show
          githubbot ASF GitHub Bot added a comment - Github user uce commented on the issue: https://github.com/apache/flink/pull/3492 Thanks for this PR and the fixes on top of it Patrick. If there are no objections, I would like to merge this. Since the commits are still attributed to Greg, I think it's fine to merge this instead of #3351.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/flink/pull/3492

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/3492
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/flink/pull/3351

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/3351
          Hide
          StephanEwen Stephan Ewen added a comment -

          Fixed in

          • 1.2.1 via 88db01a0e9997db94459bcecebc8e7d257dae0ea
          • 1.3.0 via 338c30a41d4ff04ce196bdaeb5251a222dc109c0
          Show
          StephanEwen Stephan Ewen added a comment - Fixed in 1.2.1 via 88db01a0e9997db94459bcecebc8e7d257dae0ea 1.3.0 via 338c30a41d4ff04ce196bdaeb5251a222dc109c0

            People

            • Assignee:
              greghogan Greg Hogan
              Reporter:
              elevy Elias Levy
            • Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development