Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-3775

Job does not exit after all node become unhealthy

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Not A Problem
    • 2.7.1
    • None
    • resourcemanager
    • None

    Description

      Running Terasort with data size 10G, all the containers exit since the disk space threshold 0.90 reached,at this point,the job does not exit with error
      15/06/05 13:13:28 INFO mapreduce.Job: map 9% reduce 0%
      15/06/05 13:13:52 INFO mapreduce.Job: map 10% reduce 0%
      15/06/05 13:14:30 INFO mapreduce.Job: map 11% reduce 0%
      15/06/05 13:15:11 INFO mapreduce.Job: map 12% reduce 0%
      15/06/05 13:15:43 INFO mapreduce.Job: map 13% reduce 0%
      15/06/05 13:16:38 INFO mapreduce.Job: map 14% reduce 0%
      15/06/05 13:16:41 INFO mapreduce.Job: map 15% reduce 0%
      15/06/05 13:16:53 INFO mapreduce.Job: map 16% reduce 0%
      15/06/05 13:17:24 INFO mapreduce.Job: map 17% reduce 0%
      15/06/05 13:17:53 INFO mapreduce.Job: map 18% reduce 0%
      15/06/05 13:18:36 INFO mapreduce.Job: map 19% reduce 0%
      15/06/05 13:19:03 INFO mapreduce.Job: map 20% reduce 0%
      15/06/05 13:19:09 INFO mapreduce.Job: map 15% reduce 0%
      15/06/05 13:19:32 INFO mapreduce.Job: map 16% reduce 0%
      15/06/05 13:20:00 INFO mapreduce.Job: map 17% reduce 0%
      15/06/05 13:20:36 INFO mapreduce.Job: map 18% reduce 0%
      15/06/05 13:20:57 INFO mapreduce.Job: map 19% reduce 0%
      15/06/05 13:21:22 INFO mapreduce.Job: map 18% reduce 0%
      15/06/05 13:21:24 INFO mapreduce.Job: map 14% reduce 0%
      15/06/05 13:21:25 INFO mapreduce.Job: map 9% reduce 0%
      15/06/05 13:21:28 INFO mapreduce.Job: map 10% reduce 0%
      15/06/05 13:22:22 INFO mapreduce.Job: map 11% reduce 0%
      15/06/05 13:23:06 INFO mapreduce.Job: map 12% reduce 0%
      15/06/05 13:23:41 INFO mapreduce.Job: map 9% reduce 0%
      15/06/05 13:23:42 INFO mapreduce.Job: map 5% reduce 0%
      15/06/05 13:24:38 INFO mapreduce.Job: map 6% reduce 0%
      15/06/05 13:25:16 INFO mapreduce.Job: map 7% reduce 0%
      15/06/05 13:25:53 INFO mapreduce.Job: map 8% reduce 0%
      15/06/05 13:26:35 INFO mapreduce.Job: map 9% reduce 0%

      the last response time is 15/06/05 13:26:35
      and current time :
      [root@xiachsh11 logs]# date
      Fri Jun 5 19:19:59 EDT 2015
      [root@xiachsh11 logs]#

      [root@xiachsh11 logs]# yarn node -list
      15/06/05 19:20:18 INFO client.RMProxy: Connecting to ResourceManager at xiachsh11.eng.platformlab.ibm.com/9.21.62.234:8032
      Total Nodes:0
      Node-Id Node-State Node-Http-Address Number-of-Running-Containers
      [root@xiachsh11 logs]#

      Attachments

        1. logs.tar.gz
          490 kB
          Chengshun Xia

        Activity

          People

            Unassigned Unassigned
            xiachengshun@yeah.net Chengshun Xia
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: