Uploaded image for project: 'Samza'
  1. Samza
  2. SAMZA-1572

Add fixed retries on failure in KafkaCheckpointManager

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.14.1
    • None
    • None

    Description

      KafkaCheckpointManager.writeCheckpoint currently goes into a infinite loop when an irrecoverable failure happens, this indefinitely blocks the commit phase (there by preventing processing). This exception is revealed only during the shutdown of the job making shutdown block indefinitely since the markers for shutdown are ignored by runloop which is blocked on commit phase.

      2018/01/22 19:18:10.503 WARN [KafkaCheckpointManager]  [] Failed to write checkpoint log partition entry org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb: org.apache.samza.system.SystemProducerException: Flush failed. One or more batches of messages were not sent. Retrying. 2018/01/22 19:18:10.604 WARN [KafkaCheckpointManager]  [] Failed to write checkpoint log partition entry org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb: org.apache.samza.system.SystemProducerException: Producer was unable to recover from previous exceptio 2018/01/22 19:18:10.804 WARN [KafkaCheckpointManager]  [] Failed to write checkpoint log partition entry org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb: org.apache.samza.system.SystemProducerException: Producer was unable to recover from previous exceptio 2018/01/22 19:18:11.204 WARN [KafkaCheckpointManager]  [] Failed to write checkpoint log partition entry org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb: org.apache.samza.system.SystemProducerException: Producer was unable to recover from previous exceptio 2018/01/22 19:18:12.005 WARN [KafkaCheckpointManager]  [] Failed to write checkpoint log partition entry org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb: org.apache.samza.system.SystemProducerException: Producer was unable to recover from previous exceptio 2018/01/22 19:18:13.605 WARN [KafkaCheckpointManager]  [] Failed to write checkpoint log partition entry org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb: org.apache.samza.system.SystemProducerException: Producer was unable to recover from previous exceptio 2018/01/22 19:18:16.805 WARN [KafkaCheckpointManager]  [] Failed to write checkpoint log partition entry org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb: org.apache.samza.system.SystemProducerException: Producer was unable to recover from previous exceptio 2018/01/22 19:18:23.205 WARN [KafkaCheckpointManager]  [] Failed to write checkpoint log partition entry org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb: org.apache.samza.system.SystemProducerException: Producer was unable to recover from previous exceptio 2018/01/22 19:18:33.206 WARN [KafkaCheckpointManager]  [] Failed to write checkpoint log partition entry org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb: org.apache.samza.system.SystemProducerException: Producer was unable to recover from previous exceptio 2018/01/22 19:18:43.206 WARN [KafkaCheckpointManager]  [] Failed to write checkpoint log partition entry org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb: org.apache.samza.system.SystemProducerException: Producer was unable to recover from previous exceptio 2018/01/22 19:18:53.206 WARN [KafkaCheckpointManager]  [] Failed to write checkpoint log partition entry org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb: org.apache.samza.system.SystemProducerException: Producer was unable to recover from previous exceptio 2018/01/22 19:19:03.207 WARN [KafkaCheckpointManager]  [] Failed to write checkpoint log partition entry org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb: org.apache.samza.system.SystemProducerException: Producer was unable to recover from previous exceptio 2018/01/22 19:19:13.207 WARN [KafkaCheckpointManager]  [] Failed to write checkpoint log partition entry org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb: org.apache.samza.system.SystemProducerException: Producer was unable to recover from previous exceptio 2018/01/22 19:19:23.207 WARN [KafkaCheckpointManager]  [] Failed to write checkpoint log partition entry org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb: org.apache.samza.system.SystemProducerException: Producer was unable to recover from previous exception.. Retrying.
      

      Attachments

        Issue Links

          Activity

            People

              spvenkat Shanthoosh Venkataraman
              spvenkat Shanthoosh Venkataraman
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: