Details

    • Type: Bug Bug
    • Status: Reopened
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.23.4
    • Fix Version/s: None
    • Component/s: nodemanager
    • Labels:
      None

      Description

      Currently, if an NM restarts - existing logs will be left around till they're manually cleaned up. The NM could be improved to handle these files.

        Issue Links

          Activity

          Hide
          Omkar Vinit Joshi added a comment -

          we may or may not want to delete the logs when NM restarts.

          • if Application is still running then we may not want to delete the logs and wait till the application finishes or we may just aggregate logs during restart.
          • if application is finished then there should be someway to specify whether or not to aggregate / remove logs.

          any thoughts?

          Show
          Omkar Vinit Joshi added a comment - we may or may not want to delete the logs when NM restarts. if Application is still running then we may not want to delete the logs and wait till the application finishes or we may just aggregate logs during restart. if application is finished then there should be someway to specify whether or not to aggregate / remove logs. any thoughts?
          Hide
          Bikas Saha added a comment -

          Why does the NM upload logs when it the container completes? It does not need to wait for app completion. It can use HDFS append to append the logs to the same file. This is safe since NM should be the single writer.
          NM could then delete these container logs after uploading them. Risk is duplicate data whenever NM restarts while it was in the middle of uploading a particular log.

          Show
          Bikas Saha added a comment - Why does the NM upload logs when it the container completes? It does not need to wait for app completion. It can use HDFS append to append the logs to the same file. This is safe since NM should be the single writer. NM could then delete these container logs after uploading them. Risk is duplicate data whenever NM restarts while it was in the middle of uploading a particular log.
          Hide
          Jason Lowe added a comment -

          The NM waits not only for the container to complete but for the entire application to complete – see YARN-219. Holding long-lived leases on many files in HDFS puts a lot of load on the namenode.

          It also cannot append "on the fly" since all the logs for all containers for an application on the node are in a single file in HDFS with the data for each log being contiguous within that file. Adding the ability to append to multiple log streams simultaneously is not possible in the current aggregated log format.

          It would be nice to have some mechanism to get the NM to clean up logs, as currently each time the NM restarts log files are being leaked. This has been fixed for container local directories and the distributed cache via YARN-71, but logs have been ignored. Seems like we should be consistent about these two. If the application is still running, isn't YARN-71 already deleting the app's current working directory and distcache files out from underneath it?

          Show
          Jason Lowe added a comment - The NM waits not only for the container to complete but for the entire application to complete – see YARN-219 . Holding long-lived leases on many files in HDFS puts a lot of load on the namenode. It also cannot append "on the fly" since all the logs for all containers for an application on the node are in a single file in HDFS with the data for each log being contiguous within that file. Adding the ability to append to multiple log streams simultaneously is not possible in the current aggregated log format. It would be nice to have some mechanism to get the NM to clean up logs, as currently each time the NM restarts log files are being leaked. This has been fixed for container local directories and the distributed cache via YARN-71 , but logs have been ignored. Seems like we should be consistent about these two. If the application is still running, isn't YARN-71 already deleting the app's current working directory and distcache files out from underneath it?

            People

            • Assignee:
              Omkar Vinit Joshi
              Reporter:
              Siddharth Seth
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:

                Development