Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-6629

ClusterClient cannot submit jobs to HA cluster if address not set in configuration

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 1.3.0, 1.2.1, 1.4.0
    • Fix Version/s: 1.3.0, 1.4.0
    • Component/s: Client
    • Labels:
      None

      Description

      In the general case, the ClusterClient fails to submit jobs to an HA cluster. The problem is the LazyActorSystemLoader which creates an ActorSystem, upon first call. The ActorSystem is created by reading the JobManager's address from the Configuration in order to find the connecting address via ConnectionUtils.findConnectingAddress. The address in the configuration is, however, only valid in the non-HA case. In the HA case, we have to obtain the leader's address from ZooKeeper. Therefore, if the address is not explicitly set in the flink-conf.yaml, then the ClusterClient might either fail with a RuntimeException if no address at all has been specified or it will use an invalid address and retrieve the wrong connecting address.

        Issue Links

          Activity

          Hide
          till.rohrmann Till Rohrmann added a comment -

          Aljoscha Krettek I had to wait for the backport to be merged first.

          Show
          till.rohrmann Till Rohrmann added a comment - Aljoscha Krettek I had to wait for the backport to be merged first.
          Hide
          till.rohrmann Till Rohrmann added a comment -

          1.4.0: d246515963e6f736dc78114ae1dbecbbcd93ed32
          1.3.0: f783e529c558ac3df68b4e69fb931f2d55b55db7

          Show
          till.rohrmann Till Rohrmann added a comment - 1.4.0: d246515963e6f736dc78114ae1dbecbbcd93ed32 1.3.0: f783e529c558ac3df68b4e69fb931f2d55b55db7
          Hide
          aljoscha Aljoscha Krettek added a comment -

          Till Rohrmann This can be closed, right? 😃

          Show
          aljoscha Aljoscha Krettek added a comment - Till Rohrmann This can be closed, right? 😃
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/flink/pull/3949

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/3949
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tillrohrmann commented on the issue:

          https://github.com/apache/flink/pull/3949

          Thanks for the review @rmetzger. Merging this PR.

          Show
          githubbot ASF GitHub Bot added a comment - Github user tillrohrmann commented on the issue: https://github.com/apache/flink/pull/3949 Thanks for the review @rmetzger. Merging this PR.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user rmetzger commented on the issue:

          https://github.com/apache/flink/pull/3949

          +1 to merge this change

          Show
          githubbot ASF GitHub Bot added a comment - Github user rmetzger commented on the issue: https://github.com/apache/flink/pull/3949 +1 to merge this change
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user tillrohrmann opened a pull request:

          https://github.com/apache/flink/pull/3949

          FLINK-6629 Use HAServices to find connecting address for ClusterClient's ActorSystem

          The ClusterClient starts its ActorSystem lazily. In order to find out the address
          to which to bind, the ClusterClient tries to connect to the JobManager. In order
          to find out the JobManager's address it is important to use the
          HighAvailabilityServices instead of retrieving the address information from the
          configuration, because otherwise it conflicts with HA mode.

          cc @rmetzger.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/tillrohrmann/flink fixClusterClient

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/flink/pull/3949.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #3949


          commit 6c77c336f402936d684497dc5f707fb713e52c7e
          Author: Till Rohrmann <trohrmann@apache.org>
          Date: 2017-05-19T10:01:51Z

          FLINK-6635 [test] Fix ClientConnectionTest

          The ClientConnectionTest passed even though it was failing the test because we
          were expecting an exception and checking a special word to contained in the
          exception's message. Unfortunately, we generated an AssertionError with the same
          word if the actual logic we wanted to test failed. That cause the test to pass.

          commit b75eef9532b033b49b9a192597ccaec101203447
          Author: Till Rohrmann <trohrmann@apache.org>
          Date: 2017-05-19T12:31:19Z

          FLINK-6629 Use HAServices to find connecting address for ClusterClient's ActorSystem

          The ClusterClient starts its ActorSystem lazily. In order to find out the address
          to which to bind, the ClusterClient tries to connect to the JobManager. In order
          to find out the JobManager's address it is important to use the
          HighAvailabilityServices instead of retrieving the address information from the
          configuration, because otherwise it conflicts with HA mode.


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user tillrohrmann opened a pull request: https://github.com/apache/flink/pull/3949 FLINK-6629 Use HAServices to find connecting address for ClusterClient's ActorSystem The ClusterClient starts its ActorSystem lazily. In order to find out the address to which to bind, the ClusterClient tries to connect to the JobManager. In order to find out the JobManager's address it is important to use the HighAvailabilityServices instead of retrieving the address information from the configuration, because otherwise it conflicts with HA mode. cc @rmetzger. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tillrohrmann/flink fixClusterClient Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3949.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3949 commit 6c77c336f402936d684497dc5f707fb713e52c7e Author: Till Rohrmann <trohrmann@apache.org> Date: 2017-05-19T10:01:51Z FLINK-6635 [test] Fix ClientConnectionTest The ClientConnectionTest passed even though it was failing the test because we were expecting an exception and checking a special word to contained in the exception's message. Unfortunately, we generated an AssertionError with the same word if the actual logic we wanted to test failed. That cause the test to pass. commit b75eef9532b033b49b9a192597ccaec101203447 Author: Till Rohrmann <trohrmann@apache.org> Date: 2017-05-19T12:31:19Z FLINK-6629 Use HAServices to find connecting address for ClusterClient's ActorSystem The ClusterClient starts its ActorSystem lazily. In order to find out the address to which to bind, the ClusterClient tries to connect to the JobManager. In order to find out the JobManager's address it is important to use the HighAvailabilityServices instead of retrieving the address information from the configuration, because otherwise it conflicts with HA mode.

            People

            • Assignee:
              till.rohrmann Till Rohrmann
              Reporter:
              till.rohrmann Till Rohrmann
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development