Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-3875

Issue handling checksum errors in write pipeline

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 2.0.2-alpha
    • 2.1.0-beta, 0.23.8
    • datanode, hdfs-client
    • None

    Description

      We saw this issue with one block in a large test cluster. The client is storing the data with replication level 2, and we saw the following:

      • the second node in the pipeline detects a checksum error on the data it received from the first node. We don't know if the client sent a bad checksum, or if it got corrupted between node 1 and node 2 in the pipeline.
      • this caused the second node to get kicked out of the pipeline, since it threw an exception. The pipeline started up again with only one replica (the first node in the pipeline)
      • this replica was later determined to be corrupt by the block scanner, and unrecoverable since it is the only replica

      Attachments

        1. hdfs-3875-wip.patch
          14 kB
          Kihwal Lee
        2. hdfs-3875.trunk.with.test.patch.txt
          14 kB
          Kihwal Lee
        3. hdfs-3875.trunk.with.test.patch.txt
          14 kB
          Kihwal Lee
        4. hdfs-3875.trunk.patch.txt
          15 kB
          Kihwal Lee
        5. hdfs-3875.trunk.patch.txt
          14 kB
          Kihwal Lee
        6. hdfs-3875.trunk.no.test.patch.txt
          8 kB
          Kihwal Lee
        7. hdfs-3875.trunk.no.test.patch.txt
          8 kB
          Kihwal Lee
        8. hdfs-3875.patch.txt
          18 kB
          Kihwal Lee
        9. hdfs-3875.patch.txt
          18 kB
          Kihwal Lee
        10. hdfs-3875.patch.txt
          18 kB
          Kihwal Lee
        11. hdfs-3875.branch-2.patch.txt
          18 kB
          Kihwal Lee
        12. hdfs-3875.branch-0.23.with.test.patch.txt
          12 kB
          Kihwal Lee
        13. hdfs-3875.branch-0.23.patch.txt
          17 kB
          Kihwal Lee
        14. hdfs-3875.branch-0.23.patch.txt
          18 kB
          Kihwal Lee
        15. hdfs-3875.branch-0.23.no.test.patch.txt
          8 kB
          Kihwal Lee

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            kihwal Kihwal Lee
            tlipcon Todd Lipcon
            Votes:
            0 Vote for this issue
            Watchers:
            19 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment