Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-1735

Use combiner in cogroup

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      As reported by Scott Carey in PIG-479, combiner does not get used for co-group, even if the functions applied on the bags are algebraic . -
      Quoting from the comment -
      "For example, I'm not quite sure why this one doesn't use a combiner - it reads ~350x as much input bytes from HDFS as its reduce output, a combiner would be very effective:

      J = COGROUP
      UV BY (s, d, h, g, p, pa, st) OUTER,
      UC BY (s, d, h, g, p, pa, st) OUTER,
      AT BY (s, d, h, g, p, pa, st) OUTER,
      V BY (s, d, h, g, p, pa, st) OUTER,
      C BY (s, d, h, g, p, pa, st) OUTER;

      OUTPUT = FOREACH J GENERATE
      FLATTEN(group) as (s, d, h, g, p, pa, st),
      COUNT_STAR(C) as c,
      COUNT_STAR(V) as v,
      SUM(AT.p1) as p1,
      SUM(AT.p2) as p2,
      SUM(AT.p3) as p3,
      SUM(UC.q) as ucq,
      SUM(UC.r) as ucr,
      SUM(UV.q) as uvq,
      SUM(UV.r) as uvr;
      "

      Attachments

        Activity

          People

            Unassigned Unassigned
            thejas Thejas Nair
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: