Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-10440

resource manager hangs,and i cannot submit any new jobs,but rm and nm processes are normal

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Reopened
    • Blocker
    • Resolution: Unresolved
    • 3.1.1
    • None
    • resourcemanager
    • None

    Description

      RM hangs,and i cannot submit any new jobs,but RM and NM processes are normal. I can open  xxxxx:8088/cluster/apps/RUNNING but can not xxxxx:8088/cluster/scheduler.Those apps submited can not end itself and new apps can not be submited.just everything hangs but not RM,NM server. How can I fix this?help me,please!

       

      here is the log:

      ttempt=appattempt_1600074574138_66297_000001 container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> type=NODE_LOCAL requestedPartition=
      2020-09-17 00:22:25,679 INFO  capacity.CapacityScheduler (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal
      2020-09-17 00:22:25,679 INFO  allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - assignedContainer application attempt=appattempt_1600074574138_66297_000001 container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> type=NODE_LOCAL requestedPartition=
      2020-09-17 00:22:25,679 INFO  capacity.CapacityScheduler (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal
      2020-09-17 00:22:25,679 INFO  allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - assignedContainer application attempt=appattempt_1600074574138_66297_000001 container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> type=NODE_LOCAL requestedPartition=
      2020-09-17 00:22:25,679 INFO  capacity.CapacityScheduler (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal
      2020-09-17 00:22:25,679 INFO  allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - assignedContainer application attempt=appattempt_1600074574138_66297_000001 container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> type=NODE_LOCAL requestedPartition=
      2020-09-17 00:22:25,679 INFO  capacity.CapacityScheduler (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal
      2020-09-17 00:22:25,679 INFO  allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - assignedContainer application attempt=appattempt_1600074574138_66297_000001 container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> type=NODE_LOCAL requestedPartition=
      2020-09-17 00:22:25,679 INFO  capacity.CapacityScheduler (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal
      2020-09-17 00:22:25,679 INFO  allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - assignedContainer application attempt=appattempt_1600074574138_66297_000001 container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> type=NODE_LOCAL requestedPartition=
      2020-09-17 00:22:25,680 INFO  capacity.CapacityScheduler (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal
      2020-09-17 00:22:25,680 INFO  allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - assignedContainer application attempt=appattempt_1600074574138_66297_000001 container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> type=NODE_LOCAL requestedPartition=
      2020-09-17 00:22:25,680 INFO  capacity.CapacityScheduler (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal
      2020-09-17 00:22:25,680 INFO  allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - assignedContainer application attempt=appattempt_1600074574138_66297_000001 container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> type=NODE_LOCAL requestedPartition=
      2020-09-17 00:22:25,680 INFO  capacity.CapacityScheduler (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal
      2020-09-17 00:22:25,680 INFO  allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - assignedContainer application attempt=appattempt_1600074574138_66297_000001 container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> type=NODE_LOCAL requestedPartition=
      2020-09-17 00:22:25,680 INFO  capacity.CapacityScheduler (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal
      2020-09-17 00:22:25,680 INFO  allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - assignedContainer application attempt=appattempt_1600074574138_66297_000001 container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> type=NODE_LOCAL requestedPartition=
      2020-09-17 00:22:25,680 INFO  capacity.CapacityScheduler (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal
      2020-09-17 00:22:25,680 INFO  allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - assignedContainer application attempt=appattempt_1600074574138_66297_000001 container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> type=NODE_LOCAL requestedPartition=
      2020-09-17 00:22:25,680 INFO  capacity.CapacityScheduler (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal
      2020-09-17 00:22:25,680 INFO  allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - assignedContainer application attempt=appattempt_1600074574138_66297_000001 container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> type=NODE_LOCAL requestedPartition=
      

      Attachments

        1. RM_normal_state.stack
          341 kB
          jufeng li
        2. RM_unnormal_state.stack
          486 kB
          jufeng li

        Issue Links

          Activity

            People

              Unassigned Unassigned
              Jufeng jufeng li
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated: