Hadoop Common
  1. Hadoop Common
  2. HADOOP-4517

unstable dfs when running jobs on 0.18.1

    Details

    • Hadoop Flags:
      Reviewed

      Description

      2 attempts of a job using 6000 maps, 1900 reduces

      1.st attempt: failed during reduce phase after 22 hours with 31 dead datanodes most of which became unresponsive due to an exception; dfs lost blocks
      2nd attempt: failed during map phase after 5 hours with 5 dead datanodes due to exception; dfs lost blocks responsible for job failure.

      I will post typical datanode exception and attach thread dump.

      1. datanode.out
        301 kB
        Christian Kunz
      2. 4517_20081024.patch
        3 kB
        Tsz Wo Nicholas Sze
      3. 4517_20081024b_0.18.patch
        3 kB
        Tsz Wo Nicholas Sze
      4. 4517_20081024c_0.18.patch
        3 kB
        Tsz Wo Nicholas Sze
      5. 4517_20081024d_0.18.patch
        3 kB
        Tsz Wo Nicholas Sze
      6. 4517_20081024d.patch
        3 kB
        Tsz Wo Nicholas Sze
      7. 4517_20081027.patch
        4 kB
        Tsz Wo Nicholas Sze
      8. 4517_20081027_0.18.patch
        4 kB
        Tsz Wo Nicholas Sze

        Activity

        No work has yet been logged on this issue.

          People

          • Assignee:
            Tsz Wo Nicholas Sze
            Reporter:
            Christian Kunz
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development