Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-16161

Corrupt block checksum is not reported to NameNode

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • None
    • None
    • datanode
    • None

    Description

      One of our user reported this error in the log:

      2021-07-30 09:51:27,509 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: an02nphda5777.example.com:1004:DataXceiver error processing READ_BLOCK operation  src: /10.30.10.68:35680 dst: /10.30.10.67:1004
      java.lang.IllegalArgumentException: id=-46 out of range [0, 5)
              at org.apache.hadoop.util.DataChecksum$Type.valueOf(DataChecksum.java:76)
              at org.apache.hadoop.util.DataChecksum.newDataChecksum(DataChecksum.java:167)
      

      Analysis:
      it looks like the first few bytes of checksum was bad. The first few bytes determines the type of checksum (CRC32, CRC32C…etc). But the block was never reported to NameNode and removed.

      if DN throws an IOException reading a block, it starts another thread to scan the block. If the block is indeed bad, it tells NN it’s got a bad block. But this is an IllegalArgumentException which is a RuntimeException not an IOE so it’s not handled that way.

      its’ a bug in the error handling code. It should be made more graceful.

      Suggest: catch the IllegalArgumentException in BlockMetadataHeader.preadHeader() and throw CorruptMetaHeaderException, so that DN catches the exception and perform the regular block scan check.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              weichiu Wei-Chiu Chuang
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: