Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-5932

Provide an option to use a dedicated reduce-side shuffle log

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.4.0
    • Fix Version/s: 2.7.0
    • Component/s: mrv2
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      For reducers in large jobs our users cannot easily spot portions of the log associated with problems with their code. An example reducer with INFO-level logging generates ~3500 lines / ~700KiB lines per second. 95% of the log is the client-side of the shuffle org.apache.hadoop.mapreduce.task.reduce.*

      $ wc syslog 
          3642   48192  691013 syslog
      $ grep task.reduce syslog | wc 
          3424   46534  659038
      $ grep task.reduce.ShuffleScheduler syslog | wc 
          1521   17745  251458
      $ grep task.reduce.Fetcher syslog | wc 
          1045   15340  223683
      $ grep task.reduce.InMemoryMapOutput syslog | wc 
           400    4800   72060
      $ grep task.reduce.MergeManagerImpl syslog | wc 
           432    8200  106555
      

      Byte percentage breakdown:

      Shuffle total:           95%
      
      ShuffleScheduler:        36%
      Fetcher:                 32%
      InMemoryMapOutput:       10%
      MergeManagerImpl:        15%
      

      While this is information is actually often useful for devops debugging shuffle performance issues, the job users are often lost.

      We propose to have a dedicated syslog.shuffle file.

        Attachments

        1. MAPREDUCE-5932.v01.patch
          12 kB
          Gera Shegalov
        2. MAPREDUCE-5932.v02.patch
          13 kB
          Gera Shegalov
        3. MAPREDUCE-5932.v03.patch
          20 kB
          Gera Shegalov
        4. MAPREDUCE-5932.v04.patch
          28 kB
          Gera Shegalov
        5. MAPREDUCE-5932.v05.patch
          29 kB
          Gera Shegalov
        6. MAPREDUCE-5932.v06.patch
          29 kB
          Gera Shegalov

          Activity

            People

            • Assignee:
              jira.shegalov Gera Shegalov
              Reporter:
              jira.shegalov Gera Shegalov
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: