XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 0.23.1, 2.0.0-alpha
    • 0.23.1
    • applicationmaster, mrv2
    • None
    • Reviewed
    • Hide
      Added test to validate that AM can crash multiple times and still can recover successfully after MAPREDUCE-3846.
      Show
      Added test to validate that AM can crash multiple times and still can recover successfully after MAPREDUCE-3846 .

    Description

      It looks like recovering from an RM AM dieing works very well on a single failure. But if it fails multiple times we appear to get into a live lock situation.

      yarn jar hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-*-SNAPSHOT.jar wordcount -Dyarn.app.mapreduce.am.log.level=DEBUG -Dmapreduce.job.reduces=30 input output
      12/02/03 21:06:57 WARN conf.Configuration: fs.default.name is deprecated. Instead, use fs.defaultFS
      12/02/03 21:06:57 WARN conf.Configuration: mapred.used.genericoptionsparser is deprecated. Instead, use mapreduce.client.genericoptionsparser.used
      12/02/03 21:06:57 INFO input.FileInputFormat: Total input paths to process : 17
      12/02/03 21:06:57 INFO util.NativeCodeLoader: Loaded the native-hadoop library
      12/02/03 21:06:57 WARN snappy.LoadSnappy: Snappy native library not loaded
      12/02/03 21:06:57 INFO mapreduce.JobSubmitter: number of splits:17
      12/02/03 21:06:57 INFO mapred.ResourceMgrDelegate: Submitted application application_1328302034486_0003 to ResourceManager at HOST/IP:8040
      12/02/03 21:06:57 INFO mapreduce.Job: The url to track the job: http://HOST:8088/proxy/application_1328302034486_0003/
      12/02/03 21:06:57 INFO mapreduce.Job: Running job: job_1328302034486_0003
      12/02/03 21:07:03 INFO mapreduce.Job: Job job_1328302034486_0003 running in uber mode : false
      12/02/03 21:07:03 INFO mapreduce.Job:  map 0% reduce 0%
      12/02/03 21:07:09 INFO mapreduce.Job:  map 5% reduce 0%
      12/02/03 21:07:10 INFO mapreduce.Job:  map 17% reduce 0%
      #KILLED AM with kill -9 here
      12/02/03 21:07:16 INFO mapreduce.Job:  map 29% reduce 0%
      12/02/03 21:07:17 INFO mapreduce.Job:  map 35% reduce 0%
      12/02/03 21:07:30 INFO mapreduce.Job:  map 52% reduce 0%
      12/02/03 21:07:35 INFO mapreduce.Job:  map 58% reduce 0%
      12/02/03 21:07:37 INFO mapreduce.Job:  map 70% reduce 0%
      12/02/03 21:07:41 INFO mapreduce.Job:  map 76% reduce 0%
      12/02/03 21:07:43 INFO mapreduce.Job:  map 82% reduce 0%
      12/02/03 21:07:44 INFO mapreduce.Job:  map 88% reduce 0%
      12/02/03 21:07:47 INFO mapreduce.Job:  map 94% reduce 0%
      12/02/03 21:07:49 INFO mapreduce.Job:  map 100% reduce 0%
      12/02/03 21:07:53 INFO mapreduce.Job:  map 100% reduce 3%
      12/02/03 21:08:00 INFO mapreduce.Job:  map 100% reduce 6%
      12/02/03 21:08:06 INFO mapreduce.Job:  map 100% reduce 10%
      12/02/03 21:08:12 INFO mapreduce.Job:  map 100% reduce 13%
      12/02/03 21:08:18 INFO mapreduce.Job:  map 100% reduce 16%
      #killed AM with kill -9 here
      12/02/03 21:08:20 INFO ipc.Client: Retrying connect to server: HOST/IP:44223. Already tried 0 time(s).
      12/02/03 21:08:21 INFO ipc.Client: Retrying connect to server: HOST/IP:44223. Already tried 1 time(s).
      12/02/03 21:08:22 INFO ipc.Client: Retrying connect to server: HOST/IP:44223. Already tried 2 time(s).
      12/02/03 21:08:26 INFO mapreduce.Job:  map 64% reduce 16%
      #It never makes any more progress...
      

      Attachments

        1. MAPREDUCE-3802-20120213.txt
          6 kB
          Vinod Kumar Vavilapalli
        2. MAPREDUCE-3802-20120213.txt
          6 kB
          Vinod Kumar Vavilapalli
        3. syslog
          683 kB
          Robert Joseph Evans

        Issue Links

          Activity

            People

              vinodkv Vinod Kumar Vavilapalli
              revans2 Robert Joseph Evans
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: