Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      most applications of map-reduce care about grouping and not sorting. Sorting is a (relatively expensive) way to achieve grouping. In order to achieve just grouping - one can:

      • replace the sort on the Mappers with a HashTable - and maintain lists of key-values against each hash-bucket.
      • key-value tuples inside each hash bucket are sorted - before spilling or sending to Reducer. Anytime this is done - Combiner can be invoked.
      • HashTable is serialized by hash-bucketid. So merges (of either spills or Map Outputs) works similar to today (at least there's no change in overall compute complexity of merge)

      Of course this hashtable has nothing to do with partitioning. it's just a replacement for map-side sort.

      this is (pretty much) straight from the MARS project paper: http://www.cse.ust.hk/catalac/papers/mars_pact08.pdf. They report a 45% speedup in inverted index calculation using hashing instead of sorting (reference implementation is NOT against Hadoop though).

        Issue Links

          Activity

          Joydeep Sen Sarma created issue -
          Todd Lipcon made changes -
          Field Original Value New Value
          Link This issue relates to MAPREDUCE-3235 [ MAPREDUCE-3235 ]
          Todd Lipcon made changes -
          Link This issue relates to HADOOP-7761 [ HADOOP-7761 ]
          Jerry Chen made changes -
          Link This issue depends on MAPREDUCE-2454 [ MAPREDUCE-2454 ]
          Gavin made changes -
          Link This issue depends on MAPREDUCE-2454 [ MAPREDUCE-2454 ]
          Gavin made changes -
          Link This issue depends upon MAPREDUCE-2454 [ MAPREDUCE-2454 ]

            People

            • Assignee:
              Unassigned
              Reporter:
              Joydeep Sen Sarma
            • Votes:
              1 Vote for this issue
              Watchers:
              42 Start watching this issue

              Dates

              • Created:
                Updated:

                Development