Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-1239

All datanodes are bad in 2nd phase

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Invalid
    • 0.20.1
    • None
    • hdfs-client
    • None

    Description

      • Setups:
        number of datanodes = 2
        replication factor = 2
        Type of failure: transient fault (a java i/o call throws an exception or return false)
        Number of failures = 2
        when/where failures happen = during the 2nd phase of the pipeline, each happens at each datanode when trying to perform I/O
        (e.g. dataoutputstream.flush())
      • Details:

      This is similar to HDFS-1237.
      In this case, node1 throws exception that makes client creates
      a pipeline only with node2, then tries to redo the whole thing,
      which throws another failure. So at this point, the client considers
      all datanodes are bad, and never retries the whole thing again,
      (i.e. it never asks the namenode again to ask for a new set of datanodes).
      In HDFS-1237, the bug is due to permanent disk fault. In this case, it's about transient error.

      This bug was found by our Failure Testing Service framework:
      http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
      For questions, please email us: Thanh Do (thanhdo@cs.wisc.edu) and
      Haryadi Gunawi (haryadi@eecs.berkeley.edu)

      Attachments

        Activity

          People

            Unassigned Unassigned
            thanhdo Thanh Do
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: