Kafka
  1. Kafka
  2. KAFKA-4042

DistributedHerder thread can die because of connector & task lifecycle exceptions

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.10.1.0
    • Component/s: KafkaConnect
    • Labels:
      None

      Description

      As one example, there isn't exception handling in DistributedHerder.startConnector() or the call-chain for it originating in the tick() on the herder thread, and it can throw an exception because of a bad class name in the connector config. (report of issue in wild: https://groups.google.com/d/msg/confluent-platform/EnleFnXpZCU/3B_gRxsRAgAJ)

        Issue Links

          Activity

          Hide
          ASF GitHub Bot added a comment -

          GitHub user shikhar opened a pull request:

          https://github.com/apache/kafka/pull/1745

          WIP: KAFKA-4042: prevent DistributedHerder thread from dying from connector/task lifecycle exceptions

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/shikhar/kafka distherder-stayup

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/kafka/pull/1745.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #1745


          commit 715e7535d4aaae6174b3d6c1607617b382f0a8b8
          Author: Shikhar Bhushan <shikhar@confluent.io>
          Date: 2016-08-16T19:06:56Z

          KAFKA-4042: prevent DistributedHerder thread from dying from connector/task lifecycle exceptions


          Show
          ASF GitHub Bot added a comment - GitHub user shikhar opened a pull request: https://github.com/apache/kafka/pull/1745 WIP: KAFKA-4042 : prevent DistributedHerder thread from dying from connector/task lifecycle exceptions You can merge this pull request into a Git repository by running: $ git pull https://github.com/shikhar/kafka distherder-stayup Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/1745.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1745 commit 715e7535d4aaae6174b3d6c1607617b382f0a8b8 Author: Shikhar Bhushan <shikhar@confluent.io> Date: 2016-08-16T19:06:56Z KAFKA-4042 : prevent DistributedHerder thread from dying from connector/task lifecycle exceptions
          Hide
          ASF GitHub Bot added a comment -

          GitHub user shikhar opened a pull request:

          https://github.com/apache/kafka/pull/1778

          KAFKA-4042: Contain connector & task start/stop failures within the Worker

          Invoke the statusListener.onFailure() callback on start failures so that the statusBackingStore is updated. This involved a fix to the putSafe() functionality which prevented any update that was not preceded by a (non-safe) put() from completing, so here when a connector or task is transitioning directly to FAILED.

          Worker start methods can still throw if the same connector name or task ID is already registered with the worker, as this condition should not happen.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/shikhar/kafka distherder-stayup-take4

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/kafka/pull/1778.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #1778


          commit 050b80331f63ec71f16a644e7fa8006823c94ecc
          Author: Shikhar Bhushan <shikhar@confluent.io>
          Date: 2016-08-23T23:00:10Z

          KAFKA-4042: Contain connector & task start/stop failures within the Worker

          Invoke the statusListener.onFailure() callback on start failures so that the statusBackingStore is updated. This involved a fix to the putSafe() functionality which prevented any update that was not preceded by a (non-safe) put() from completing, so here when a connector or task is transitioning directly to FAILED.

          Worker start methods can still throw if the same connector name or task ID is already registered with the worker, as this condition should not happen.


          Show
          ASF GitHub Bot added a comment - GitHub user shikhar opened a pull request: https://github.com/apache/kafka/pull/1778 KAFKA-4042 : Contain connector & task start/stop failures within the Worker Invoke the statusListener.onFailure() callback on start failures so that the statusBackingStore is updated. This involved a fix to the putSafe() functionality which prevented any update that was not preceded by a (non-safe) put() from completing, so here when a connector or task is transitioning directly to FAILED. Worker start methods can still throw if the same connector name or task ID is already registered with the worker, as this condition should not happen. You can merge this pull request into a Git repository by running: $ git pull https://github.com/shikhar/kafka distherder-stayup-take4 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/1778.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1778 commit 050b80331f63ec71f16a644e7fa8006823c94ecc Author: Shikhar Bhushan <shikhar@confluent.io> Date: 2016-08-23T23:00:10Z KAFKA-4042 : Contain connector & task start/stop failures within the Worker Invoke the statusListener.onFailure() callback on start failures so that the statusBackingStore is updated. This involved a fix to the putSafe() functionality which prevented any update that was not preceded by a (non-safe) put() from completing, so here when a connector or task is transitioning directly to FAILED. Worker start methods can still throw if the same connector name or task ID is already registered with the worker, as this condition should not happen.
          Hide
          ASF GitHub Bot added a comment -

          Github user shikhar closed the pull request at:

          https://github.com/apache/kafka/pull/1745

          Show
          ASF GitHub Bot added a comment - Github user shikhar closed the pull request at: https://github.com/apache/kafka/pull/1745
          Hide
          ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/kafka/pull/1778

          Show
          ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/kafka/pull/1778
          Hide
          Ewen Cheslack-Postava added a comment -

          Issue resolved by pull request 1778
          https://github.com/apache/kafka/pull/1778

          Show
          Ewen Cheslack-Postava added a comment - Issue resolved by pull request 1778 https://github.com/apache/kafka/pull/1778

            People

            • Assignee:
              Shikhar Bhushan
              Reporter:
              Shikhar Bhushan
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development