Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-3929

Uncleaning option for local app log files with log-aggregation feature

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 2.4.0, 2.6.0
    • Fix Version/s: None
    • Component/s: log-aggregation
    • Labels:
      None
    • Target Version/s:

      Description

      Although it makes sense to delete local app log files once AppLogAggregator copied all files into remote location(HDFS), I have some use cases that need to leave local app log files after it's copied to HDFS. Mostly it's for own backup purpose. I would like to use log-aggregation feature of YARN and want to back up app log files too. Without this option, files has to copy from HDFS to local again.

      1. YARN-3929.02.patch
        11 kB
        Dongwook Kwon

        Activity

        Hide
        dongwook Dongwook Kwon added a comment -

        The reason is we already had similar tool as log-aggregator out of hadoop, not only for YARN, it was designed for Hadoop 1 which didn't have native log-aggregation feature, in our cluster, each node has daemon that periodically checks application log in local and push to S3, it works fine even with 2000 nodes, the issue we have now is with YARN's log-aggregation, as you can imagine, 2 systems tries to do the same things, and other internal users want to use YARN's log-aggregation for such as HUE or "yarn logs --applicationId" command, and we still need to support Hadoop 1, so whenever cluster turns on YARN's log-aggregation, we don't have application log for troubleshooting. This has been an issue for long and simple solution for our team is making this optional as I suggested, I agree, for most of use cases, it may not be useful, so I make default as cleaning up and make sure test catch it.

        Show
        dongwook Dongwook Kwon added a comment - The reason is we already had similar tool as log-aggregator out of hadoop, not only for YARN, it was designed for Hadoop 1 which didn't have native log-aggregation feature, in our cluster, each node has daemon that periodically checks application log in local and push to S3, it works fine even with 2000 nodes, the issue we have now is with YARN's log-aggregation, as you can imagine, 2 systems tries to do the same things, and other internal users want to use YARN's log-aggregation for such as HUE or "yarn logs --applicationId" command, and we still need to support Hadoop 1, so whenever cluster turns on YARN's log-aggregation, we don't have application log for troubleshooting. This has been an issue for long and simple solution for our team is making this optional as I suggested, I agree, for most of use cases, it may not be useful, so I make default as cleaning up and make sure test catch it.
        Hide
        vinodkv Vinod Kumar Vavilapalli added a comment -

        Dongwook Kwon, how do you perform this backup you mention on a large cluster? Isn't it easy to copy it from HDFS instead of from 1000 machines? I don't see how the later is desired.

        Show
        vinodkv Vinod Kumar Vavilapalli added a comment - Dongwook Kwon , how do you perform this backup you mention on a large cluster? Isn't it easy to copy it from HDFS instead of from 1000 machines? I don't see how the later is desired.
        Hide
        varun_saxena Varun Saxena added a comment -

        Yes, debug delay configuration would impact every file submitted to deletion service, including the ones localized.

        Show
        varun_saxena Varun Saxena added a comment - Yes, debug delay configuration would impact every file submitted to deletion service, including the ones localized.
        Hide
        dongwook Dongwook Kwon added a comment -

        Thanks Xuan for the information.
        I quickly looked yarn.nodemanager.delete.debug-delay-sec and tested, it appears the setting affect on DeletionService which means it will delay or not to delete all local files which are supposed to be deleted by DeletionService? I do want to keep application log for my own backup/troubleshooting but not for other files for such as application's localization, usercache, filecache, nmPrivate, spilled files etc, I would like to delete these as quick cycle as possible. Please correct me if I was misunderstood about yarn.nodemanager.delete.debug-delay-sec
        I couldn't find exact what I want, If there is any option that I can keep only application log in local with log-aggregation feature, I would just use it and close this case.

        Show
        dongwook Dongwook Kwon added a comment - Thanks Xuan for the information. I quickly looked yarn.nodemanager.delete.debug-delay-sec and tested, it appears the setting affect on DeletionService which means it will delay or not to delete all local files which are supposed to be deleted by DeletionService? I do want to keep application log for my own backup/troubleshooting but not for other files for such as application's localization, usercache, filecache, nmPrivate, spilled files etc, I would like to delete these as quick cycle as possible. Please correct me if I was misunderstood about yarn.nodemanager.delete.debug-delay-sec I couldn't find exact what I want, If there is any option that I can keep only application log in local with log-aggregation feature, I would just use it and close this case.
        Hide
        xgong Xuan Gong added a comment -

        Dongwook Kwon
        Does this configuration: yarn.nodemanager.delete.debug-delay-sec satisfy your requirement ?

        Show
        xgong Xuan Gong added a comment - Dongwook Kwon Does this configuration: yarn.nodemanager.delete.debug-delay-sec satisfy your requirement ?
        Hide
        dongwook Dongwook Kwon added a comment -

        Could you review this patch, Thanks.

        Show
        dongwook Dongwook Kwon added a comment - Could you review this patch, Thanks.

          People

          • Assignee:
            Unassigned
            Reporter:
            dongwook Dongwook Kwon
          • Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

            • Created:
              Updated:

              Development