Uploaded image for project: 'Samza'
  1. Samza
  2. SAMZA-1385

Fix coordination issues during stream creation in LocalApplicationRunner

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.13.1
    • Component/s: None
    • Labels:
      None

      Description

      Bug fixes related to coordination logic around stream creation.

      In case of applications that involve creating intermediate stream, a single process waits for itself to become a leader during job coordination phase if it has already acquired leadership during stream creation phase. The reason for this starvation is due to the fact that both these leader election use the same zookeeper node. We need to use separate nodes for leader election for job model creation & stream creation.

        Issue Links

          Activity

          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user bharathkk opened a pull request:

          https://github.com/apache/samza/pull/265

          SAMZA-1385: Fix zookeeper path conflict between LocalApplicationRunner and ZkJobCoordinator

          Tested the fix w/ sample page view adclick joiner job.
          @navina @sborya @nickpan47 can you please take a look at the RB?

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/bharathkk/samza master

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/samza/pull/265.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #265


          commit c86ec0a22b8e18f544343609a3d8b423cd85658a
          Author: Bharath Kumarasubramanian <bkumaras@linkedin.com>
          Date: 2017-05-26T20:14:17Z

          Extract hdfs docs into its own section

          commit eb48b5d3a2f651b44d0402287682868c58d1d444
          Author: Bharath Kumarasubramanian <bkumaras@linkedin.com>
          Date: 2017-06-08T22:40:54Z

          Merge remote-tracking branch 'upstream/master'

          commit a28cb882d7790c35ed54f9f9d9291b75a63faf1c
          Author: Bharath Kumarasubramanian <bkumaras@linkedin.com>
          Date: 2017-08-08T00:34:35Z

          Merge remote-tracking branch 'upstream/master'

          commit 37c5879517b0ea931aa8f8bdd3aadd32c8b1106d
          Author: Bharath Kumarasubramanian <bkumaras@linkedin.com>
          Date: 2017-08-09T18:07:15Z

          Merge remote-tracking branch 'upstream/master'

          commit e2ed4c8fe5d2976d0783e8b3fe978dc7ebcd4483
          Author: Bharath Kumarasubramanian <bkumaras@linkedin.com>
          Date: 2017-08-09T22:11:25Z

          SAMZA-1385: Fix zookeeper path conflict between LocalApplicationRunner and ZkJobCoordinator


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user bharathkk opened a pull request: https://github.com/apache/samza/pull/265 SAMZA-1385 : Fix zookeeper path conflict between LocalApplicationRunner and ZkJobCoordinator Tested the fix w/ sample page view adclick joiner job. @navina @sborya @nickpan47 can you please take a look at the RB? You can merge this pull request into a Git repository by running: $ git pull https://github.com/bharathkk/samza master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/samza/pull/265.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #265 commit c86ec0a22b8e18f544343609a3d8b423cd85658a Author: Bharath Kumarasubramanian <bkumaras@linkedin.com> Date: 2017-05-26T20:14:17Z Extract hdfs docs into its own section commit eb48b5d3a2f651b44d0402287682868c58d1d444 Author: Bharath Kumarasubramanian <bkumaras@linkedin.com> Date: 2017-06-08T22:40:54Z Merge remote-tracking branch 'upstream/master' commit a28cb882d7790c35ed54f9f9d9291b75a63faf1c Author: Bharath Kumarasubramanian <bkumaras@linkedin.com> Date: 2017-08-08T00:34:35Z Merge remote-tracking branch 'upstream/master' commit 37c5879517b0ea931aa8f8bdd3aadd32c8b1106d Author: Bharath Kumarasubramanian <bkumaras@linkedin.com> Date: 2017-08-09T18:07:15Z Merge remote-tracking branch 'upstream/master' commit e2ed4c8fe5d2976d0783e8b3fe978dc7ebcd4483 Author: Bharath Kumarasubramanian <bkumaras@linkedin.com> Date: 2017-08-09T22:11:25Z SAMZA-1385 : Fix zookeeper path conflict between LocalApplicationRunner and ZkJobCoordinator
          Hide
          nickpan47 Yi Pan (Data Infrastructure) added a comment -

          Bharath Kumarasubramanian, could you make the description of the issue a bit more clear? I assume you are saying that the leader election for stream creation and the leader election for job coordinator are using the same znode path. Hence, a single process can potentially acquire the leadership for stream creation and waiting for itself to become the leader JC, which won't happen since it is waiting for itself to release the leader position for stream creation, which is in the same znode?

          So, at high-level, a straightforward solution is to separate the leader election nodes for stream creation and job model creation (i.e. job coordinator). Is that right?

          Show
          nickpan47 Yi Pan (Data Infrastructure) added a comment - Bharath Kumarasubramanian , could you make the description of the issue a bit more clear? I assume you are saying that the leader election for stream creation and the leader election for job coordinator are using the same znode path. Hence, a single process can potentially acquire the leadership for stream creation and waiting for itself to become the leader JC, which won't happen since it is waiting for itself to release the leader position for stream creation, which is in the same znode? So, at high-level, a straightforward solution is to separate the leader election nodes for stream creation and job model creation (i.e. job coordinator). Is that right?
          Hide
          navina Navina Ramesh added a comment -

          Yi Pan (Data Infrastructure) Yep. You have very accurately described the issue.

          Show
          navina Navina Ramesh added a comment - Yi Pan (Data Infrastructure) Yep. You have very accurately described the issue.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/samza/pull/265

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/samza/pull/265
          Hide
          navina Navina Ramesh added a comment -

          Issue resolved by pull request 265
          https://github.com/apache/samza/pull/265

          Show
          navina Navina Ramesh added a comment - Issue resolved by pull request 265 https://github.com/apache/samza/pull/265
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user sborya opened a pull request:

          https://github.com/apache/samza/pull/284

          SAMZA-1385: Coordination utils factory with distributed lock

          this PR includes some changes from another PR. I will re-merge it again, after the other PR is in.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/sborya/samza CoordinationUtilsFactory_withDistributedLock

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/samza/pull/284.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #284


          commit 9de7539727e227c92f67be4ae0d9de4a8a1ae26c
          Author: Boris Shkolnik <boryas@apache.org>
          Date: 2017-08-22T17:36:18Z

          created the factory

          commit b6363ccfe6bb458047df11fff3e012e06cdd79b8
          Author: Boris Shkolnik <boryas@apache.org>
          Date: 2017-08-22T18:42:56Z

          removed reset, added close() per interface

          commit 1ac7ae9d260380d61afa7fea7a12d1ae110dc334
          Author: Boris Shkolnik <boryas@apache.org>
          Date: 2017-08-22T18:47:18Z

          use getName() for class name

          commit 14e64e95443113669563e447cb6e3b86749615d4
          Author: Boris Shkolnik <boryas@apache.org>
          Date: 2017-08-23T00:23:11Z

          update 4.8.1

          commit b15e0fce7a336b3663529e3e9e7539f85f6afe2c
          Author: Boris Shkolnik <boryas@apache.org>
          Date: 2017-08-23T00:35:50Z

          update 4.8.1

          commit ec075bfd2731e429598c20d357cf6c9550ea51f1
          Author: Boris Shkolnik <boryas@apache.org>
          Date: 2017-08-23T18:02:46Z

          added Distributed lock

          commit 71cafa96e7abede5d76e8a907c2354bcdaff33a0
          Author: Boris Shkolnik <boryas@apache.org>
          Date: 2017-08-24T01:59:52Z

          added locks

          commit 436519b0121136b1edb2c3b8fa6e4784d45244e3
          Author: Boris Shkolnik <boryas@apache.org>
          Date: 2017-08-24T22:32:42Z

          abstract

          commit e0f22f09f4de6528a30cd1541974314e48dec14f
          Author: Boris Shkolnik <boryas@apache.org>
          Date: 2017-08-25T21:04:09Z

          Merge branch 'master' into CoordinationUtilsFactory_withDistributedLock

          commit 52f4d3d13e872ba86894f7fe5161067458cbb5d1
          Author: Boris Shkolnik <boryas@apache.org>
          Date: 2017-08-25T21:04:53Z

          merge


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user sborya opened a pull request: https://github.com/apache/samza/pull/284 SAMZA-1385 : Coordination utils factory with distributed lock this PR includes some changes from another PR. I will re-merge it again, after the other PR is in. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sborya/samza CoordinationUtilsFactory_withDistributedLock Alternatively you can review and apply these changes as the patch at: https://github.com/apache/samza/pull/284.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #284 commit 9de7539727e227c92f67be4ae0d9de4a8a1ae26c Author: Boris Shkolnik <boryas@apache.org> Date: 2017-08-22T17:36:18Z created the factory commit b6363ccfe6bb458047df11fff3e012e06cdd79b8 Author: Boris Shkolnik <boryas@apache.org> Date: 2017-08-22T18:42:56Z removed reset, added close() per interface commit 1ac7ae9d260380d61afa7fea7a12d1ae110dc334 Author: Boris Shkolnik <boryas@apache.org> Date: 2017-08-22T18:47:18Z use getName() for class name commit 14e64e95443113669563e447cb6e3b86749615d4 Author: Boris Shkolnik <boryas@apache.org> Date: 2017-08-23T00:23:11Z update 4.8.1 commit b15e0fce7a336b3663529e3e9e7539f85f6afe2c Author: Boris Shkolnik <boryas@apache.org> Date: 2017-08-23T00:35:50Z update 4.8.1 commit ec075bfd2731e429598c20d357cf6c9550ea51f1 Author: Boris Shkolnik <boryas@apache.org> Date: 2017-08-23T18:02:46Z added Distributed lock commit 71cafa96e7abede5d76e8a907c2354bcdaff33a0 Author: Boris Shkolnik <boryas@apache.org> Date: 2017-08-24T01:59:52Z added locks commit 436519b0121136b1edb2c3b8fa6e4784d45244e3 Author: Boris Shkolnik <boryas@apache.org> Date: 2017-08-24T22:32:42Z abstract commit e0f22f09f4de6528a30cd1541974314e48dec14f Author: Boris Shkolnik <boryas@apache.org> Date: 2017-08-25T21:04:09Z Merge branch 'master' into CoordinationUtilsFactory_withDistributedLock commit 52f4d3d13e872ba86894f7fe5161067458cbb5d1 Author: Boris Shkolnik <boryas@apache.org> Date: 2017-08-25T21:04:53Z merge
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/samza/pull/284

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/samza/pull/284

            People

            • Assignee:
              bharathkk Bharath Kumarasubramanian
              Reporter:
              bharathkk Bharath Kumarasubramanian
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development