Uploaded image for project: 'Samza'
  1. Samza
  2. SAMZA-1069

Deadlock between KafkaSystemProducer and KafkaProducer from kafka-clients lib

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.11.0
    • 0.12.0
    • None
    • None

    Description

      We have identified one deadlock scenario between the main thread that calls KafkaSystemProducer.close() vs the KafkaProducer client lib's network thread that calls the callback function within KafkaSystemProducer.send().

      The scenario is the following:

      1. SamzaContainer main thread caught an exception from previous commit and container initiated shutdown, which calls KafkaSystemProducer.stop(), grabbing the synchronized producerLock in KafkaSystemProducer and call KafkaProducer.flush() to wait for all pending requests to be done.
      2. KafkaProducer network I/O thread then calls KafkaSystemProducer’s callback function (in RecordBatch.done()), which is waiting on the same producerLock in KafkaSystemProducer before it can return and call producerFuture.done() and release the CountDownLatch that the main thread KafkaSystemProducer.close() is waiting on. Hence, deadlock!

      We need to make sure the KafkaSystemProducer.close() won't have race condition w/ the callbacks triggered by the KafkaProducer's network thread.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            xinyu Xinyu Liu
            nickpan47 Yi Pan
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment