Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-5669

flink-streaming-contrib DataStreamUtils.collect in local environment mode fails when offline

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.3.0, 1.2.1
    • Component/s: flink-contrib
    • Labels:
      None

      Description

      DataStreamUtils.collect() needs to obtain the local machine's IP so that the job can send the results back. In the case of local StreamEnvironments, it uses InetAddress.getLocalHost(), which attempts to resolve the local hostname using DNS.

      If DNS is not available (for example, when offline) or if DNS is available but cannot resolve the hostname (for example, if the hostname is an intranet name but the machine is not currently on that network), an UnknownHostException will be thrown (and wrapped in an IOException).
      If the resolved IP is not reachable for some reason, streaming results will fail.

      Since this case is for local execution only, it seems that using InetAddress.getLoopbackAddress() would work just as well, and avoid the assumptions made by getLocalHost().

        Issue Links

          Activity

          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user rick-cox opened a pull request:

          https://github.com/apache/flink/pull/3223

          FLINK-5669 Change DataStreamUtils to use the loopback address (127.0.0.1)

          For local environments, using loopback rather than the "local address" allows tests to run in
          situations where the local machine's hostname may not be resolvable in DNS
          (because DNS is unreachable or the hostname is not found) or the hostname does
          resolve, but not to an IP address that is reachable.

          Thanks for contributing to Apache Flink. Before you open your pull request, please take the following check list into consideration.
          If your changes take all of the items into account, feel free to open your pull request. For more information and/or questions please refer to the [How To Contribute guide](http://flink.apache.org/how-to-contribute.html).
          In addition to going through the list, please provide a meaningful description of your changes.

          • [x] General
          • The pull request references the related JIRA issue ("[FLINK-XXX] Jira title text")
          • The pull request addresses only one issue
          • Each commit in the PR has a meaningful commit message (including the JIRA id)
          • [x] Documentation
          • Documentation has been added for new functionality
          • Old documentation affected by the pull request has been updated
          • JavaDoc for public methods has been added
          • [x] Tests & Build
          • Functionality added by the pull request is covered by tests
          • `mvn clean verify` has been executed successfully locally or a Travis build has passed

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/rick-cox/flink master

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/flink/pull/3223.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #3223


          commit e12da9f1b2542fb1d5f862e1894c16ea834addcb
          Author: Rick Cox <rickcox@amazon.com>
          Date: 2017-01-26T22:55:23Z

          FLINK-5669 Change DataStreamUtils to use the loopback address (127.0.0.1) with local environments.

          Using loopback rather than the "local address" allows tests to run in
          situations where the local machine's hostname may not be resolvable in DNS
          (because DNS is unreacable or the hostname is not found) or the hostname does
          resolve, but not to an IP address that is reachable.


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user rick-cox opened a pull request: https://github.com/apache/flink/pull/3223 FLINK-5669 Change DataStreamUtils to use the loopback address (127.0.0.1) For local environments, using loopback rather than the "local address" allows tests to run in situations where the local machine's hostname may not be resolvable in DNS (because DNS is unreachable or the hostname is not found) or the hostname does resolve, but not to an IP address that is reachable. Thanks for contributing to Apache Flink. Before you open your pull request, please take the following check list into consideration. If your changes take all of the items into account, feel free to open your pull request. For more information and/or questions please refer to the [How To Contribute guide] ( http://flink.apache.org/how-to-contribute.html ). In addition to going through the list, please provide a meaningful description of your changes. [x] General The pull request references the related JIRA issue (" [FLINK-XXX] Jira title text") The pull request addresses only one issue Each commit in the PR has a meaningful commit message (including the JIRA id) [x] Documentation Documentation has been added for new functionality Old documentation affected by the pull request has been updated JavaDoc for public methods has been added [x] Tests & Build Functionality added by the pull request is covered by tests `mvn clean verify` has been executed successfully locally or a Travis build has passed You can merge this pull request into a Git repository by running: $ git pull https://github.com/rick-cox/flink master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3223.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3223 commit e12da9f1b2542fb1d5f862e1894c16ea834addcb Author: Rick Cox <rickcox@amazon.com> Date: 2017-01-26T22:55:23Z FLINK-5669 Change DataStreamUtils to use the loopback address (127.0.0.1) with local environments. Using loopback rather than the "local address" allows tests to run in situations where the local machine's hostname may not be resolvable in DNS (because DNS is unreacable or the hostname is not found) or the hostname does resolve, but not to an IP address that is reachable.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user StephanEwen commented on the issue:

          https://github.com/apache/flink/pull/3223

          The change basically renders the utility unusable, other than for local execution...

          Show
          githubbot ASF GitHub Bot added a comment - Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/3223 The change basically renders the utility unusable, other than for local execution...
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user rick-cox commented on the issue:

          https://github.com/apache/flink/pull/3223

          Thanks for taking a look. Could you explain more? The modified code is in the else branch of ```if (env instanceof RemoteStreamEnvironment)```, so my assumption was that it only needed to work for local execution. (And that it wouldn't have worked if used with remote execution before, because it just selects the IP returned by DNS, which is often not the IP that a remote taskmanager would need to use to reach the client — hence the use of ```ConnectionUtils.findConnectingAddress``` in the remote case.)

          Show
          githubbot ASF GitHub Bot added a comment - Github user rick-cox commented on the issue: https://github.com/apache/flink/pull/3223 Thanks for taking a look. Could you explain more? The modified code is in the else branch of ```if (env instanceof RemoteStreamEnvironment)```, so my assumption was that it only needed to work for local execution. (And that it wouldn't have worked if used with remote execution before, because it just selects the IP returned by DNS, which is often not the IP that a remote taskmanager would need to use to reach the client — hence the use of ```ConnectionUtils.findConnectingAddress``` in the remote case.)
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user StephanEwen commented on the issue:

          https://github.com/apache/flink/pull/3223

          Sorry for answering so brief before.

          The branch you changed is executed for all non-remote environments. Those are the local environments, but also the "context environments", which are used when you submit a job via the command line to a standalone/yarn/mesos cluster.

          You can probably make a change to check explicitly whether the environment is a local environment, and use the loopback address only then, and keep the local host in the other cases.

          Show
          githubbot ASF GitHub Bot added a comment - Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/3223 Sorry for answering so brief before. The branch you changed is executed for all non-remote environments. Those are the local environments, but also the "context environments", which are used when you submit a job via the command line to a standalone/yarn/mesos cluster. You can probably make a change to check explicitly whether the environment is a local environment, and use the loopback address only then, and keep the local host in the other cases.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user rick-cox commented on the issue:

          https://github.com/apache/flink/pull/3223

          Thanks for the detailed explanation. I confirmed that using the previous patch with a standalone cluster failed, then applied the suggested alternative approach and tested that works with both standalone and local (online and offline). (I don't have ready access to Yarn or Mesos clusters, but believe they should be fine since this version only affects LocalStreamEnvironments).

          Show
          githubbot ASF GitHub Bot added a comment - Github user rick-cox commented on the issue: https://github.com/apache/flink/pull/3223 Thanks for the detailed explanation. I confirmed that using the previous patch with a standalone cluster failed, then applied the suggested alternative approach and tested that works with both standalone and local (online and offline). (I don't have ready access to Yarn or Mesos clusters, but believe they should be fine since this version only affects LocalStreamEnvironments).
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user StephanEwen commented on the issue:

          https://github.com/apache/flink/pull/3223

          This change looks good, thank you!

          Merging this...

          Show
          githubbot ASF GitHub Bot added a comment - Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/3223 This change looks good, thank you! Merging this...
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/flink/pull/3223

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/3223
          Hide
          StephanEwen Stephan Ewen added a comment -

          Fixed in

          • 1.2.1 via 6955030d5faaf3ffb6156171eed38f10c254295d
          • 1.3.0 via 3104619250fa0e0e87b4bb3e05b1cce9d39e6983
          Show
          StephanEwen Stephan Ewen added a comment - Fixed in 1.2.1 via 6955030d5faaf3ffb6156171eed38f10c254295d 1.3.0 via 3104619250fa0e0e87b4bb3e05b1cce9d39e6983

            People

            • Assignee:
              Unassigned
              Reporter:
              rickcox Rick Cox
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development