Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-3935

ConnectDistributedTest.test_restart_failed_task.connector_type=sink system test failing

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.10.0.1
    • Component/s: KafkaConnect
    • Labels:
      None

      Description

      This has failed a few times, see e.g. http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-07-07--001.1467911236--apache--trunk--efc4c88/report.html Note that it is only the sink task version, the source task one works ok.

      ====================================================================================================
      test_id:    2016-07-06--001.kafkatest.tests.connect.connect_distributed_test.ConnectDistributedTest.test_restart_failed_task.connector_type=sink
      status:     FAIL
      run time:   1 minute 10.991 seconds
      
      
          Failed to see task transition to the FAILED state
      Traceback (most recent call last):
        File "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/tests/runner.py", line 106, in run_all_tests
          data = self.run_single_test()
        File "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/tests/runner.py", line 162, in run_single_test
          return self.current_test_context.function(self.current_test)
        File "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/mark/_mark.py", line 331, in wrapper
          return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
        File "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/connect/connect_distributed_test.py", line 175, in test_restart_failed_task
          err_msg="Failed to see task transition to the FAILED state")
        File "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/utils/util.py", line 36, in wait_until
          raise TimeoutError(err_msg)
      TimeoutError: Failed to see task transition to the FAILED state
      

      I checked the worker logs and it does look like we're seeing the exception:

      [2016-07-06 15:22:19,061] ERROR Task mock-sink-0 threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerSinkTask)
      java.lang.RuntimeException
              at org.apache.kafka.connect.tools.MockSinkTask.put(MockSinkTask.java:58)
              at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:384)
              at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:228)
              at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:171)
              at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:143)
              at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:140)
              at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:175)
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
              at java.util.concurrent.FutureTask.run(FutureTask.java:262)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
              at java.lang.Thread.run(Thread.java:745)
      [2016-07-06 15:22:19,062] ERROR Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerSinkTask)
      [2016-07-06 15:22:19,062] INFO WorkerSinkTask{id=mock-sink-0} Committing offsets (org.apache.kafka.connect.runtime.WorkerSinkTask)
      [2016-07-06 15:22:19,065] DEBUG Group connect-mock-sink committed offset 0 for partition test-0 (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
      [2016-07-06 15:22:19,065] DEBUG Finished WorkerSinkTask{id=mock-sink-0} offset commit successfully in 3 ms (org.apache.kafka.connect.runtime.WorkerSinkTask)
      [2016-07-06 15:22:19,065] ERROR Task mock-sink-0 threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask)
      org.apache.kafka.connect.errors.ConnectException: Exiting WorkerSinkTask due to unrecoverable exception.
              at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:406)
              at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:228)
              at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:171)
              at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:143)
              at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:140)
              at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:175)
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
              at java.util.concurrent.FutureTask.run(FutureTask.java:262)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
              at java.lang.Thread.run(Thread.java:745)
      [2016-07-06 15:22:19,065] ERROR Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask)
      

      So this is either a timing issue or the error handling in WorkerSinkTask is not properly setting the FAILED state.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                ewencp Ewen Cheslack-Postava
                Reporter:
                ewencp Ewen Cheslack-Postava
                Reviewer:
                Ewen Cheslack-Postava
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: