Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-9495

Map Side aggregation affecting map performance

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 0.14.0
    • None
    • Query Processor
    • None
    • RHEL 6.4
      Hortonworks Hadoop 2.2

    Description

      When trying to run a simple aggregation query with hive.map.aggr=true, map tasks take a lot of time in Hive 0.14 as against with hive.map.aggr=false.

      e.g.
      Consider the query:

      INSERT OVERWRITE TABLE lineitem_tgt_agg
      select alias.a0 as a0,
       alias.a2 as a1,
       alias.a1 as a2,
       alias.a3 as a3,
       alias.a4 as a4
      from (
       select alias.a0 as a0,
        SUM(alias.a1) as a1,
        SUM(alias.a2) as a2,
        SUM(alias.a3) as a3,
        SUM(alias.a4) as a4
       from (
        select lineitem_sf500.l_orderkey as a0,
         CAST(lineitem_sf500.l_quantity * lineitem_sf500.l_extendedprice * (1 - lineitem_sf500.l_discount) * (1 + lineitem_sf500.l_tax) as double) as a1,
         lineitem_sf500.l_quantity as a2,
         CAST(lineitem_sf500.l_quantity * lineitem_sf500.l_extendedprice * lineitem_sf500.l_discount as double) as a3,
         CAST(lineitem_sf500.l_quantity * lineitem_sf500.l_extendedprice * lineitem_sf500.l_tax as double) as a4
        from lineitem_sf500
        ) alias
       group by alias.a0
       ) alias;
      

      The above query was run with ~376GB of data / ~3billion records in the source.
      It takes ~10 minutes with hive.map.aggr=false.
      With map side aggregation set to true, the map tasks don't complete even after an hour.

      Attachments

        1. profiler_screenshot.PNG
          60 kB
          Anand Sridharan
        2. HIVE-9495.1.patch.txt
          43 kB
          Navis Ryu
        3. HIVE-9495.2.patch.txt
          19 kB
          Navis Ryu

        Issue Links

          Activity

            People

              asridhar Anand Sridharan
              asridhar Anand Sridharan
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: