Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-9480

createAppDir() in LogAggregationService shouldn't block dispatcher thread of ContainerManagerImpl

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • None
    • None
    • nodemanager
    • None

    Description

      At present, when startContainers(), if NM does not contain the application, it will enter the step of INIT_APPLICATION. In the application init step, createAppDir() will be executed, and it is a blocking operation.

      createAppDir() is an operation that needs to interact with an external file system. This operation is affected by the SLA of the external file system. Once the external file system has a high latency, the NM dispatcher thread of ContainerManagerImpl will be stuck. (In fact, I have seen a scene that NM stuck here for more than an hour.)

      I think it would be more reasonable to move createAppDir() to the actual time of uploading log (in other threads). And according to the logRetentionPolicy, many of the containers may not get to this step, which will save a lot of interactions with external file system.

      Attachments

        1. YARN-9480.001.patch
          41 kB
          Yunyao Zhang

        Activity

          People

            Yunyao Zhang Yunyao Zhang
            yoelee liyakun
            Votes:
            1 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated: