Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-8609

NM oom because of large container statuses

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • None
    • None
    • nodemanager
    • None

    Description

      Sometimes, NodeManger will send large container statuses to ResourceManager when NodeManger start with recovering, as a result , NodeManger will be failed to start because of oom.
      In my case, the large container statuses size is 135M, which contain 11 container statuses, and I find the diagnostics of 5 containers are very large(27M), so, I truncate the container diagnostics as the patch.

      Attachments

        1. contain_status.jpg
          362 kB
          Xianghao Lu
        2. oom.jpeg
          224 kB
          Xianghao Lu
        3. YARN-8609.001.patch
          3 kB
          Xianghao Lu

        Issue Links

          Activity

            People

              Unassigned Unassigned
              luxianghao Xianghao Lu
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: