Uploaded image for project: 'Slider'
  1. Slider
  2. SLIDER-939

flex down does not cancel the outstanding request

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: Slider 0.80
    • Fix Version/s: Slider 1.0.0
    • Component/s: core
    • Labels:
    • Environment:

      Hadoop 2.7.1
      Slider 0.80.0

    • Flags:
      Important

      Description

      I run slider app on a 6 nodes cluster. To ensure there is only one comonent(worker) instance on each node, I set yarn.memory to 51% of the total memory.
      Then I flex up to 7 workers, there would be one worker request(outstanding) that will never be met, this is expected.

      Then I flexed down back to 6 workers, and any container request for any job would be blocked even if there are plenty of memory/core for the job, From RM log, we can see there are continuous output:
      capacity.CapacityScheduler (CapacityScheduler.java:allocateContainersToNode(1240)) - Skipping scheduling since node test.example.com:45454 is reserved by application appattempt_1442384698868_0008_000001

      It seems the outstanding requests are not actually cancelled in the requesting container queue but keep trying to request.

      After I flexed down to 5 workers, the other blocked jobs can run.
      This is related to JIRA https://issues.apache.org/jira/browse/SLIDER-490

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                stevel@apache.org Steve Loughran
                Reporter:
                yjchen Youjie Chen
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated: