Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-9480

createAppDir() in LogAggregationService shouldn't block dispatcher thread of ContainerManagerImpl

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: nodemanager
    • Labels:
      None

      Description

      At present, when startContainers(), if NM does not contain the application, it will enter the step of INIT_APPLICATION. In the application init step, createAppDir() will be executed, and it is a blocking operation.

      createAppDir() is an operation that needs to interact with an external file system. This operation is affected by the SLA of the external file system. Once the external file system has a high latency, the NM dispatcher thread of ContainerManagerImpl will be stuck. (In fact, I have seen a scene that NM stuck here for more than an hour.)

      I think it would be more reasonable to move createAppDir() to the actual time of uploading log (in other threads). And according to the logRetentionPolicy, many of the containers may not get to this step, which will save a lot of interactions with external file system.

        Attachments

          Activity

            People

            • Assignee:
              yoelee liyakun
              Reporter:
              yoelee liyakun
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: