Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-4517

unstable dfs when running jobs on 0.18.1

    Details

    • Hadoop Flags:
      Reviewed

      Description

      2 attempts of a job using 6000 maps, 1900 reduces

      1.st attempt: failed during reduce phase after 22 hours with 31 dead datanodes most of which became unresponsive due to an exception; dfs lost blocks
      2nd attempt: failed during map phase after 5 hours with 5 dead datanodes due to exception; dfs lost blocks responsible for job failure.

      I will post typical datanode exception and attach thread dump.

        Attachments

        1. 4517_20081024.patch
          3 kB
          Tsz Wo Nicholas Sze
        2. 4517_20081024b_0.18.patch
          3 kB
          Tsz Wo Nicholas Sze
        3. 4517_20081024c_0.18.patch
          3 kB
          Tsz Wo Nicholas Sze
        4. 4517_20081024d_0.18.patch
          3 kB
          Tsz Wo Nicholas Sze
        5. 4517_20081024d.patch
          3 kB
          Tsz Wo Nicholas Sze
        6. 4517_20081027_0.18.patch
          4 kB
          Tsz Wo Nicholas Sze
        7. 4517_20081027.patch
          4 kB
          Tsz Wo Nicholas Sze
        8. datanode.out
          301 kB
          Christian Kunz

          Activity

            People

            • Assignee:
              szetszwo Tsz Wo Nicholas Sze
              Reporter:
              ckunz Christian Kunz
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: