Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-1565

Transaction manager failover handling

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • None
    • None
    • None

    Description

      Transaction manager should guarantee that, once a pre-commit/pre-abort request is acknowledged, commit/abort request will be delivered to partitions involved in the transaction.

      In particular, we handle the following failover scenarios:

      1) Transaction manager or its followers fail before txRequest is duplicated on local log and followers.
      Solution: Transaction manager responds to request with error status if it is alive. The producer keeps trying commit.

      2) The txPartition’s leader is not available.
      Solution: Put txRequest on unSentTxRequestQueue. When metadataCache is updated, check and re-send txRequest from unSentTxRequestQueue if possible.

      3) The txPartition’s leader fails when txRequest is in channel manager.
      Solution: Retrieve all txRequests queued for transmission to this broker and put them on unSentTxRequestQueue.

      4) Transaction manage does not receive success response from txPartition’s leaders within timeout period.
      Solution: Transaction manager expires the txRequest and re-send it.

      5) Transaction manager fails.
      Solution: The new transaction manager reads transactionHW from zookeeper, and sends txRequest starting from the transactionHW.

      Attachments

        1. KAFKA-1565.patch
          47 kB
          Dong Lin

        Activity

          People

            lindong Dong Lin
            lindong Dong Lin
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: