Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-6728

Job will run slow when the performance of defaultFs degrades and the log-aggregation is enable.

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.7.1
    • None
    • nodemanager, yarn
    • None
    • CentOS 7.1 hadoop-2.7.1

    Description

      In our cluster, I found many map keep "NEW" state for several minutes. Here I got the container log:

      [2017-06-13T18:21:23.068+08:00] [INFO] containermanager.application.ApplicationImpl.transition(ApplicationImpl.java 304) [AsyncDispatcher event handler] : Adding container_1495632926847_2459604_01_000011 to application application_1495632926847_2459604
      [2017-06-13T18:23:08.715+08:00] [INFO] containermanager.container.ContainerImpl.handle(ContainerImpl.java 1137) [AsyncDispatcher event handler] : Container container_1495632926847_2459604_01_000011 transitioned from NEW to LOCALIZING
      

      Then I search the log from 18:21:23.068 to 18:23:08.715. I found some dispatch of AsyncDispather run slow, because they visit the defaultFs. Our cluster increase to 4k node, the pressure of defaultFs increase. (Note: log-aggregation is enable. )

      Container runs in nodemanager will invoke initApp(), then invoke verifyAndCreateRemoteLogDir and mkdir remote log, these operation will visit the defaultFs. So the container will be stuck here. Then application will run slow.

      Attachments

        1. YARN-6728.patch.00_branch-2.7
          4 kB
          Chenyu Zheng

        Activity

          People

            Unassigned Unassigned
            zhengchenyu Chenyu Zheng
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - 1m
                1m
                Remaining:
                Remaining Estimate - 1m
                1m
                Logged:
                Time Spent - Not Specified
                Not Specified