Details

    • Type: Sub-task Sub-task
    • Status: Open
    • Priority: Critical Critical
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: nodemanager
    • Labels:
      None

      Description

      Currently, if log aggregation is enabled for a cluster - logs for all jobs will be aggregated - leading to a whole bunch of files on hdfs which users may not want.
      Users should be able to control this along with the aggregation policy - failed only, all, etc.

        Issue Links

          Activity

          Hide
          Vinod Kumar Vavilapalli added a comment -

          +1, the infrastructure for this is all there, including tests IIRC, we just need to expose it to the users.

          Show
          Vinod Kumar Vavilapalli added a comment - +1, the infrastructure for this is all there, including tests IIRC, we just need to expose it to the users.
          Hide
          Siddharth Seth added a comment -

          In the initial patch, am planning to only support per job enable/disable aggregation. Along with this - an option to keep the logs on the NM for some time if the job disables aggregation.
          Something like FAILED_ONLY will not work very well rightnow - since container exit status is not necessarily an indication of whether a task completes successfully or not.

          Show
          Siddharth Seth added a comment - In the initial patch, am planning to only support per job enable/disable aggregation. Along with this - an option to keep the logs on the NM for some time if the job disables aggregation. Something like FAILED_ONLY will not work very well rightnow - since container exit status is not necessarily an indication of whether a task completes successfully or not.
          Hide
          Robert Joseph Evans added a comment -

          Once YARN-221 goes in we should be able to expose a lot more functionality, like aggregate only a sampling of successful jobs, aggregate only failed jobs, etc.

          Show
          Robert Joseph Evans added a comment - Once YARN-221 goes in we should be able to expose a lot more functionality, like aggregate only a sampling of successful jobs, aggregate only failed jobs, etc.
          Hide
          Chris Trezzo added a comment -

          I have submitted a patch on YARN-221 and I am also starting to work on the MapReduce level changes to provide configuration for log aggregation on a per job basis. I will hopefully post a patch soon.

          Show
          Chris Trezzo added a comment - I have submitted a patch on YARN-221 and I am also starting to work on the MapReduce level changes to provide configuration for log aggregation on a per job basis. I will hopefully post a patch soon.
          Hide
          Lohit Vijayarenu added a comment -

          Patch looks good to me. Can anyone else also take a look at the patch.

          Show
          Lohit Vijayarenu added a comment - Patch looks good to me. Can anyone else also take a look at the patch.
          Hide
          Ming Ma added a comment -

          Regarding Seth's comment of "container exit status is not necessarily an indication of whether a task completes successfully or not", https://issues.apache.org/jira/browse/MAPREDUCE-5465 should fix the issue.

          Show
          Ming Ma added a comment - Regarding Seth's comment of "container exit status is not necessarily an indication of whether a task completes successfully or not", https://issues.apache.org/jira/browse/MAPREDUCE-5465 should fix the issue.

            People

            • Assignee:
              Chris Trezzo
              Reporter:
              Siddharth Seth
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:

                Development