Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-431 [Umbrella] Complete/Stabilize YARN application log-handling
  3. YARN-1440

Yarn aggregated logs are difficult for external tools to understand

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None

    Description

      The log aggregation feature in Yarn is awesome! However, the file type and format in which the log files are aggregated into (TFile) should either be much simpler or be made pluggable. The current TFile format forces anyone who wants to see the files to either
      a) use the web UI
      b) use the CLI tools (yarn logs) or
      c) write custom code to read the files

      My suggestion would be to simplify the log collection by collecting and writing the raw log files into a directory structure as follows:

      /{log-collection-dir}/{app-id}/{container-id}/{log-file-name} 
      

      This way the application developers can (re)use a much wider array of tools to process the logs.

      For the readers who are not familiar with logs and their format you can find more info the following two blog posts:
      http://hortonworks.com/blog/simplifying-user-logs-management-and-access-in-yarn/
      http://blogs.splunk.com/2013/11/18/hadoop-2-0-rant/

      Attachments

        Activity

          People

            Unassigned Unassigned
            ledion ledion bitincka
            Votes:
            2 Vote for this issue
            Watchers:
            18 Start watching this issue

            Dates

              Created:
              Updated: