Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-8722

Data crc check repair

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.10.2.2
    • 0.10.2.3
    • log
    • None
    • Patch

    Description

      In our production environment, when we consume kafka's topic data in an operating program, we found an error:

      org.apache.kafka.common.KafkaException: Record for partition rl_dqn_debug_example-49 at offset 2911287689 is invalid, cause: Record is corrupt (stored crc = 3580880396, computed crc = 1701403171)
      at org.apache.kafka.clients.consumer.internals.Fetcher.parseRecord(Fetcher.java:869)
      at org.apache.kafka.clients.consumer.internals.Fetcher.parseCompletedFetch(Fetcher.java:788)
      at org.apache.kafka.clients.consumer.internals.Fetcher.fetchedRecords(Fetcher.java:480)
      at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1188)
      at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1046)
      at kafka.consumer.NewShinyConsumer.receive(BaseConsumer.scala:88)
      at kafka.tools.ConsoleConsumer$.process(ConsoleConsumer.scala:120)
      at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:75)
      at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:50)
      at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala)

      At this point we used the kafka.tools.DumpLogSegments tool to parse the disk log file and found that there was indeed dirty data:

      By looking at the code, I found that in some cases kafka would not verify the data and write it to disk, so we fixed it.
      We found that when record.offset is not equal to the offset we are expecting, kafka will set the variable inPlaceAssignment to false. When inPlaceAssignment is false, data will not be verified:

      Our repairs are as follows:

      We did a comparative test for this. By modifying the client-side producer code, we made some dirty data. For the original kafka version, it was able to write to the disk normally, but when it was consumed, it was reported, but our repaired version was written. At the time, it can be verified, so this producer write failed:

      At this time, when the client consumes, an error will be reported:

      When the kafka server is replaced with the repaired version, the producer will verify that the dirty data is written. The producer failed to write the data this time

      Attachments

        1. image-2019-07-27-14-50-08-128.png
          392 kB
          ChenLin
        2. image-2019-07-27-14-50-58-300.png
          389 kB
          ChenLin
        3. image-2019-07-27-14-56-25-610.png
          405 kB
          ChenLin
        4. image-2019-07-27-14-57-06-687.png
          593 kB
          ChenLin
        5. image-2019-07-27-15-05-12-565.png
          445 kB
          ChenLin
        6. image-2019-07-27-15-06-07-123.png
          301 kB
          ChenLin
        7. image-2019-07-27-15-10-21-709.png
          140 kB
          ChenLin
        8. image-2019-07-27-15-18-22-716.png
          419 kB
          ChenLin
        9. image-2019-07-30-11-39-01-605.png
          268 kB
          ChenLin

        Issue Links

          Activity

            People

              Unassigned Unassigned
              LordChen ChenLin
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: