1. Pig
  2. PIG-2829

Use partial aggregation more aggresively


    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.10.0
    • Fix Version/s: None
    • Component/s: None
    • Labels:


      Partial aggregation (Hash Aggregation, aka in-map combiner) is a new feature in Pig 0.10 that will perform aggregation within map function. The main advantage against combiner is it avoids de/serializing and sorting the data, and it can auto disable itself if the data reduction rate is low. Currently it's disabled by default.

      To leverage the power of PartialAgg more aggressively, several things need to be revisited:

      1. The threshold of auto-disabling. Currently each mapper looks at first 1k (hard-coded) records to see if there's enough data size reduction (defaults to 10x, configurable). The check would happen earlier if the hash table gets full before processing the 1k records (hash table size is controlled by pig.cachedbag.memusage). We might want to relax these thresholds.

      2. Dependency on the combiner. Currently the PartialAgg won't work without a combiner following it, so we need to provide separate options to enable each independently.

      1. 2829.1.patch
        11 kB
        Jie Li
      2. 2829.2.patch
        20 kB
        Jie Li
      3. 2829.separate.options.patch
        4 kB
        Jie Li
      4. pigmix-10G.png
        114 kB
        Jie Li
      5. tpch-10G.png
        100 kB
        Jie Li

        Issue Links


          Jie Li created issue -
          Jie Li made changes -
          Field Original Value New Value
          Attachment pigmix-10G.png [ 12537264 ]
          Attachment tpch-10G.png [ 12537265 ]
          Jie Li made changes -
          Link This issue is related to PIG-2228 [ PIG-2228 ]
          Jie Li made changes -
          Attachment 2829.separate.options.patch [ 12538029 ]
          Jie Li made changes -
          Attachment 2829.1.patch [ 12538093 ]
          Jie Li made changes -
          Attachment 2829.2.patch [ 12538211 ]


            • Assignee:
              Jie Li
            • Votes:
              0 Vote for this issue
              4 Start watching this issue


              • Created: