Details

    • Type: Sub-task
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None

      Description

      The log aggregation feature in Yarn is awesome! However, the file type and format in which the log files are aggregated into (TFile) should either be much simpler or be made pluggable. The current TFile format forces anyone who wants to see the files to either
      a) use the web UI
      b) use the CLI tools (yarn logs) or
      c) write custom code to read the files

      My suggestion would be to simplify the log collection by collecting and writing the raw log files into a directory structure as follows:

      /{log-collection-dir}/{app-id}/{container-id}/{log-file-name} 
      

      This way the application developers can (re)use a much wider array of tools to process the logs.

      For the readers who are not familiar with logs and their format you can find more info the following two blog posts:
      http://hortonworks.com/blog/simplifying-user-logs-management-and-access-in-yarn/
      http://blogs.splunk.com/2013/11/18/hadoop-2-0-rant/

        Activity

        Hide
        jlowe Jason Lowe added a comment -

        My suggestion would be to simplify the log collection by collecting and writing the raw log files into a directory structure as follows

        I agree that approach would be simple, but it has a lot of issues at scale. One of the biggest issues with log aggregation on a large, busy cluster is the number of files it generates and the write load it places on the namenode. Storing the logs in HDFS 1-to-1 as they appear in the container log directories on the nodes would be a lot of files. Zillions of tiny files is not something HDFS does particularly well. We already have to set the log retention period lower than we'd like on some of our large, busy clusters due to the namespace pressure from aggregated logs, and it's already coalescing all of the logs for all of an app's containers that ran on a particular node.

        That being said, I totally agree the TFile format for aggregated logs is not very fun to wield as a user. I don't know the thought process that went into choosing it, but I suspect it was a straightforward way to aggregate all of an app's logfiles on a node into a single file in HDFS.

        Maybe one way to get the benefit of both easy-to-access logs and less namespace pressure is to go ahead and aggregate them as separate files but have a periodic process to archive logs in a har to reduce the namespace. That wouldn't address the significant additional write load this approach would place on the namenode, however.

        Show
        jlowe Jason Lowe added a comment - My suggestion would be to simplify the log collection by collecting and writing the raw log files into a directory structure as follows I agree that approach would be simple, but it has a lot of issues at scale. One of the biggest issues with log aggregation on a large, busy cluster is the number of files it generates and the write load it places on the namenode. Storing the logs in HDFS 1-to-1 as they appear in the container log directories on the nodes would be a lot of files. Zillions of tiny files is not something HDFS does particularly well. We already have to set the log retention period lower than we'd like on some of our large, busy clusters due to the namespace pressure from aggregated logs, and it's already coalescing all of the logs for all of an app's containers that ran on a particular node. That being said, I totally agree the TFile format for aggregated logs is not very fun to wield as a user. I don't know the thought process that went into choosing it, but I suspect it was a straightforward way to aggregate all of an app's logfiles on a node into a single file in HDFS. Maybe one way to get the benefit of both easy-to-access logs and less namespace pressure is to go ahead and aggregate them as separate files but have a periodic process to archive logs in a har to reduce the namespace. That wouldn't address the significant additional write load this approach would place on the namenode, however.
        Hide
        sandyr Sandy Ryza added a comment -

        Would it be helpful for YARN to supply a public API that reads the files for you?

        Show
        sandyr Sandy Ryza added a comment - Would it be helpful for YARN to supply a public API that reads the files for you?
        Hide
        ledion ledion bitincka added a comment -

        Storing the logs in HDFS 1-to-1 as they appear in the container log directories on the nodes would be a lot of files.

        Jason Lowe - from my understanding the NodeManager creates one TFile for each container executed, within which it then encodes and stores all the log files that the container created. For example, for an MR application the TFile would contain stdout, stderr and syslog - usually the first two are of size 0, while syslog contains the app's logs. Therefore, there's no real reduction in the number of files created. How common is it for other YARN apps to have more than one log file?

        Would it be helpful for YARN to supply a public API that reads the files for you?

        Sandy Ryza - that would be helpful, however simple flat files would be the best api, thus all the tools available for HDFS files would be available for log files too.

        Show
        ledion ledion bitincka added a comment - Storing the logs in HDFS 1-to-1 as they appear in the container log directories on the nodes would be a lot of files. Jason Lowe - from my understanding the NodeManager creates one TFile for each container executed, within which it then encodes and stores all the log files that the container created. For example, for an MR application the TFile would contain stdout, stderr and syslog - usually the first two are of size 0, while syslog contains the app's logs. Therefore, there's no real reduction in the number of files created. How common is it for other YARN apps to have more than one log file? Would it be helpful for YARN to supply a public API that reads the files for you? Sandy Ryza - that would be helpful, however simple flat files would be the best api, thus all the tools available for HDFS files would be available for log files too.
        Hide
        zjffdu Jeff Zhang added a comment -

        @ledion, the current implementation will be one TFile per application, while your method will create one TFile per container which would generate more files.
        I guess the reason why the original author adopt TFile is that TFile has one index block which allow user quickly find the value. In this way, user could quickly find one container's log of one application.

        Show
        zjffdu Jeff Zhang added a comment - @ledion, the current implementation will be one TFile per application, while your method will create one TFile per container which would generate more files. I guess the reason why the original author adopt TFile is that TFile has one index block which allow user quickly find the value. In this way, user could quickly find one container's log of one application.
        Hide
        ledion ledion bitincka added a comment -

        Jeff Zhang - I stand corrected, there's currently one TFile per application per node with container_id used as the key. While this is better than creating one file per container, it still leaves the cluster exposed to the small file problem, imagine a 1000 node cluster, running 10000 apps/day - this would lead to 10M new TFiles. My hope is reduced complexity at the log file level while punting the small file problem to the FS layer - the reasoning here being that not all filesystems which can be used on Hadoop have a small file problem!

        Show
        ledion ledion bitincka added a comment - Jeff Zhang - I stand corrected, there's currently one TFile per application per node with container_id used as the key. While this is better than creating one file per container, it still leaves the cluster exposed to the small file problem, imagine a 1000 node cluster, running 10000 apps/day - this would lead to 10M new TFiles. My hope is reduced complexity at the log file level while punting the small file problem to the FS layer - the reasoning here being that not all filesystems which can be used on Hadoop have a small file problem!
        Hide
        vinodkv Vinod Kumar Vavilapalli added a comment -

        That being said, I totally agree the TFile format for aggregated logs is not very fun to wield as a user. I don't know the thought process that went into choosing it, but I suspect it was a straightforward way to aggregate all of an app's logfiles on a node into a single file in HDFS.

        The original reason why I picked TFile is programmatic access for users. With logs there are conflicting user cases - on one hand user would like them to be human readable and on the other hand people want to write tools. So I picked TFile for machine readability together with a log dumper to facilitate human readability.

        Maybe one way to get the benefit of both easy-to-access logs and less namespace pressure is to go ahead and aggregate them as separate files but have a periodic process to archive logs in a har to reduce the namespace. That wouldn't address the significant additional write load this approach would place on the namenode, however.

        My hope is reduced complexity at the log file level while punting the small file problem to the FS layer - the reasoning here being that not all filesystems which can be used on Hadoop have a small file problem!

        Yes, because of the later issue (NameNode load), we should think before we make this leap. HDFS is the dominant FS that people use for YARN+MR jobs and YARN need to work well there.

        Would it be helpful for YARN to supply a public API that reads the files for you?

        We already have this. See AggregatedLogFormat and LogCLIHelpers.

        Once we have more power in HDFS, it is very likely that we'll change this to be a single file + directory structure.

        We can definitely move things around so that this concept of a per-node, per-app file is totally only for HDFS and for some other implementation we can have a single file. I am +1 if that is the goal - we just need to find and put appropriate abstractions.

        Show
        vinodkv Vinod Kumar Vavilapalli added a comment - That being said, I totally agree the TFile format for aggregated logs is not very fun to wield as a user. I don't know the thought process that went into choosing it, but I suspect it was a straightforward way to aggregate all of an app's logfiles on a node into a single file in HDFS. The original reason why I picked TFile is programmatic access for users. With logs there are conflicting user cases - on one hand user would like them to be human readable and on the other hand people want to write tools. So I picked TFile for machine readability together with a log dumper to facilitate human readability. Maybe one way to get the benefit of both easy-to-access logs and less namespace pressure is to go ahead and aggregate them as separate files but have a periodic process to archive logs in a har to reduce the namespace. That wouldn't address the significant additional write load this approach would place on the namenode, however. My hope is reduced complexity at the log file level while punting the small file problem to the FS layer - the reasoning here being that not all filesystems which can be used on Hadoop have a small file problem! Yes, because of the later issue (NameNode load), we should think before we make this leap. HDFS is the dominant FS that people use for YARN+MR jobs and YARN need to work well there. Would it be helpful for YARN to supply a public API that reads the files for you? We already have this. See AggregatedLogFormat and LogCLIHelpers. Once we have more power in HDFS, it is very likely that we'll change this to be a single file + directory structure. We can definitely move things around so that this concept of a per-node, per-app file is totally only for HDFS and for some other implementation we can have a single file. I am +1 if that is the goal - we just need to find and put appropriate abstractions.
        Hide
        ledion ledion bitincka added a comment -

        How about allowing for AppLogAggregator to be pluggable? This would most likely be a pretty simple patch, there's only one place where AppLogAggregatorImpl is instantiated LogAggregationService.java

        324     // New application
        325     final AppLogAggregator appLogAggregator =
        326         new AppLogAggregatorImpl(this.dispatcher, this.deletionService,
        327             getConfig(), appId, userUgi, dirsHandler,
        328             getRemoteNodeLogFileForApp(appId, user), logRetentionPolicy,
        329             appAcls);
        330     if (this.appLogAggregators.putIfAbsent(appId, appLogAggregator) != null) {
        331       throw new YarnRuntimeException("Duplicate initApp for " + appId);
        332     }
        
        Show
        ledion ledion bitincka added a comment - How about allowing for AppLogAggregator to be pluggable? This would most likely be a pretty simple patch, there's only one place where AppLogAggregatorImpl is instantiated LogAggregationService.java 324 // New application 325 final AppLogAggregator appLogAggregator = 326 new AppLogAggregatorImpl( this .dispatcher, this .deletionService, 327 getConfig(), appId, userUgi, dirsHandler, 328 getRemoteNodeLogFileForApp(appId, user), logRetentionPolicy, 329 appAcls); 330 if ( this .appLogAggregators.putIfAbsent(appId, appLogAggregator) != null ) { 331 throw new YarnRuntimeException( "Duplicate initApp for " + appId); 332 }

          People

          • Assignee:
            Unassigned
            Reporter:
            ledion ledion bitincka
          • Votes:
            1 Vote for this issue
            Watchers:
            14 Start watching this issue

            Dates

            • Created:
              Updated:

              Development