Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-4517

unstable dfs when running jobs on 0.18.1

    XMLWordPrintableJSON

Details

    • Reviewed

    Description

      2 attempts of a job using 6000 maps, 1900 reduces

      1.st attempt: failed during reduce phase after 22 hours with 31 dead datanodes most of which became unresponsive due to an exception; dfs lost blocks
      2nd attempt: failed during map phase after 5 hours with 5 dead datanodes due to exception; dfs lost blocks responsible for job failure.

      I will post typical datanode exception and attach thread dump.

      Attachments

        1. datanode.out
          301 kB
          Christian Kunz
        2. 4517_20081027.patch
          4 kB
          Tsz-wo Sze
        3. 4517_20081027_0.18.patch
          4 kB
          Tsz-wo Sze
        4. 4517_20081024d.patch
          3 kB
          Tsz-wo Sze
        5. 4517_20081024d_0.18.patch
          3 kB
          Tsz-wo Sze
        6. 4517_20081024c_0.18.patch
          3 kB
          Tsz-wo Sze
        7. 4517_20081024b_0.18.patch
          3 kB
          Tsz-wo Sze
        8. 4517_20081024.patch
          3 kB
          Tsz-wo Sze

        Activity

          People

            szetszwo Tsz-wo Sze
            ckunz Christian Kunz
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: