Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-10563

Make sure task directories don't remain locked by dead threads

    XMLWordPrintableJSON

Details

    Description

      Most common/expected exceptions within Streams are handled gracefully, and the thread will make sure to clean up all resources such as task locks during shutdown. However, there are some instances where an unexpected exception such as an IllegalStateException can leave some resources orphaned.

      We have seen this happen to task directories after an IllegalStateException is hit during the TaskManager's rebalance handling logic – the Thread shuts down, but loses track of some tasks before unlocking them. This blocks any further work on that task by any other thread in the same instance.

      Previously we decided that this was "ok" because an IllegalStateException means all bets are off. But with the upcoming work of KIP-663 and KIP-671, users will be able to react smartly on dying threads and replace them with new ones, making it more important than ever to ensure that the application can continue on with no lasting repercussions of a thread death. If we allow users to revive/replace a thread that dies due to IllegalStateException, that thread should not be blocked from doing any work by the ghost of its predecessor. 

      It might be easiest to just add some logic to the cleanup thread to verify all the existing locks against the list of live threads, and remove any zombie locks. But we probably want to do this purging more frequently than the cleanup thread runs (10min by default) – so maybe we can leverage the work in KIP-671 and have each thread purge any locks still owned by it after the uncaught exception handler runs, but before the thread dies.

      Attachments

        Activity

          People

            Unassigned Unassigned
            ableegoldman A. Sophie Blee-Goldman
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: