Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1922

Counters for data-local and rack-local tasks should be replaced by bytes-read-local and bytes-read-rack

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None
    • All

    Description

      As more and more applications use combine file input format (to reduce number of mappers), formats with columns groups implemented as different hdfs files (zebra, hbase), composite input formats (map-side joins), data-locality and rack-locality loses its meaning. (A map task reading only one column group, say 20% of its input, locally and 80% remote still gets flagged as data-local map.)

      So, my suggestion is to drop these counters, and instead, replace them with HDFS_LOCAL_BYTES_READ, HDFS_RACK_BYTES_READ, and HDFS_TOTAL_BYTES_READ. These counters will make it easier to reason about read-performance for maps.

      Attachments

        Activity

          People

            acmurthy Arun Murthy
            milindb Milind Barve
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: