Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-4071

Corruptted replication-offset-checkpoint leads to kafka server disfunctional

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Critical
    • Resolution: Unresolved
    • Affects Version/s: 0.9.0.1
    • Fix Version/s: None
    • Component/s: clients, offset manager
    • Labels:
      None
    • Environment:
      Red Hat Enterprise 6.7

      Description

      For an unknown reason, [kafka data root]/replication-offset-checkpoint was corrupted. First Kafka reported an NumberFormatException in kafka sever.out. And then it reported "error when handling request Name: FetchRequest; ... " ERRORs repeatedly (ERROR details below). As a result, clients cannot read from or write to Kafka on several partitions until replication-offset-checkpoint was manually deleted.

      Can Kafka broker handle this error and survive from it?
      And what's the reason this file was corrupted? - Only one file was corrupted and no noticeable disk failure was detected.

      ERROR [KafkaApi-7] error when handling request
      java.lang.NumberFormatException: For input string: " N?-; O"
      at java.lang.NumberFormatException.forInputString(NumberFormatException.java:77)
      at java.lang.Integer.parseInt(Integer.java:493)
      at java.lang.Integer.parseInt(Integer.java:539)
      at scala.collection.immutable.StringLike$class.toInt(StringLike.scala:272)
      at scala.collection.immutable.StringOps.toInt(StringOps.scala:30)
      at kafka.server.OffsetCheckpoint.read(OffsetCheckpoint.scala:78)
      at kafka.cluster.Partition.getOrCreateReplica(Partition.scala:93)
      at kafka.cluster.Partition$$anonfun$4$$anonfun$apply$2.apply(Partition.scala:173)
      at kafka.cluster.Partition$$anonfun$4$$anonfun$apply$2.apply(Partition.scala:173)
      at scala.collection.immutable.Set$Set2.foreach(Set.scala:111)
      at kafka.cluster.Partition$$anonfun$4.apply(Partition.scala:173)

      ERROR [KafkaApi-7] error when handling request Name: FetchRequest; Version: 1; CorrelationId: 0; ClientId: ReplicaFetcherThread-1-7; ReplicaId: 6; MaxWait: 500 ms; MinBytes: 1 bytes; RequestInfo: [prodTopicDal09E,166] -> PartitionFetchInfo(7123666,20971520),[prodTopicDal09E,118] -> PartitionFetchInfo(7128188,20971520),[prodTopicDal09E,238] ->

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              zanezhang Zane Zhang
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: