Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-5932

Provide an option to use a dedicated reduce-side shuffle log

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.4.0
    • Fix Version/s: 2.7.0
    • Component/s: mrv2
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      For reducers in large jobs our users cannot easily spot portions of the log associated with problems with their code. An example reducer with INFO-level logging generates ~3500 lines / ~700KiB lines per second. 95% of the log is the client-side of the shuffle org.apache.hadoop.mapreduce.task.reduce.*

      $ wc syslog 
          3642   48192  691013 syslog
      $ grep task.reduce syslog | wc 
          3424   46534  659038
      $ grep task.reduce.ShuffleScheduler syslog | wc 
          1521   17745  251458
      $ grep task.reduce.Fetcher syslog | wc 
          1045   15340  223683
      $ grep task.reduce.InMemoryMapOutput syslog | wc 
           400    4800   72060
      $ grep task.reduce.MergeManagerImpl syslog | wc 
           432    8200  106555
      

      Byte percentage breakdown:

      Shuffle total:           95%
      
      ShuffleScheduler:        36%
      Fetcher:                 32%
      InMemoryMapOutput:       10%
      MergeManagerImpl:        15%
      

      While this is information is actually often useful for devops debugging shuffle performance issues, the job users are often lost.

      We propose to have a dedicated syslog.shuffle file.

      1. MAPREDUCE-5932.v06.patch
        29 kB
        Gera Shegalov
      2. MAPREDUCE-5932.v05.patch
        29 kB
        Gera Shegalov
      3. MAPREDUCE-5932.v04.patch
        28 kB
        Gera Shegalov
      4. MAPREDUCE-5932.v03.patch
        20 kB
        Gera Shegalov
      5. MAPREDUCE-5932.v02.patch
        13 kB
        Gera Shegalov
      6. MAPREDUCE-5932.v01.patch
        12 kB
        Gera Shegalov

        Activity

          People

          • Assignee:
            Gera Shegalov
            Reporter:
            Gera Shegalov
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development