Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-9495

Map Side aggregation affecting map performance

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 0.14.0
    • Fix Version/s: None
    • Component/s: Query Processor
    • Labels:
      None
    • Environment:

      RHEL 6.4
      Hortonworks Hadoop 2.2

      Description

      When trying to run a simple aggregation query with hive.map.aggr=true, map tasks take a lot of time in Hive 0.14 as against with hive.map.aggr=false.

      e.g.
      Consider the query:

      INSERT OVERWRITE TABLE lineitem_tgt_agg
      select alias.a0 as a0,
       alias.a2 as a1,
       alias.a1 as a2,
       alias.a3 as a3,
       alias.a4 as a4
      from (
       select alias.a0 as a0,
        SUM(alias.a1) as a1,
        SUM(alias.a2) as a2,
        SUM(alias.a3) as a3,
        SUM(alias.a4) as a4
       from (
        select lineitem_sf500.l_orderkey as a0,
         CAST(lineitem_sf500.l_quantity * lineitem_sf500.l_extendedprice * (1 - lineitem_sf500.l_discount) * (1 + lineitem_sf500.l_tax) as double) as a1,
         lineitem_sf500.l_quantity as a2,
         CAST(lineitem_sf500.l_quantity * lineitem_sf500.l_extendedprice * lineitem_sf500.l_discount as double) as a3,
         CAST(lineitem_sf500.l_quantity * lineitem_sf500.l_extendedprice * lineitem_sf500.l_tax as double) as a4
        from lineitem_sf500
        ) alias
       group by alias.a0
       ) alias;
      

      The above query was run with ~376GB of data / ~3billion records in the source.
      It takes ~10 minutes with hive.map.aggr=false.
      With map side aggregation set to true, the map tasks don't complete even after an hour.

        Attachments

        1. HIVE-9495.2.patch.txt
          19 kB
          Navis Ryu
        2. HIVE-9495.1.patch.txt
          43 kB
          Navis Ryu
        3. profiler_screenshot.PNG
          60 kB
          Anand Sridharan

          Issue Links

            Activity

              People

              • Assignee:
                asridhar Anand Sridharan
                Reporter:
                asridhar Anand Sridharan
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: