Details

    • Type: New Feature
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.10.0
    • Component/s: impl
    • Labels:
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      Allow cross two or more bags inside a foreach statement. For example:

      user = load 'user' as (uid, age, gender, region);
      session = load 'session' as (uid, region);
      C = cogroup user by uid, session by uid;
      D = foreach C {
          crossed = cross user, session;
          generate crossed;
      }
      Show
      Allow cross two or more bags inside a foreach statement. For example: user = load 'user' as (uid, age, gender, region); session = load 'session' as (uid, region); C = cogroup user by uid, session by uid; D = foreach C {     crossed = cross user, session;     generate crossed; }

      Description

      It is useful to have cross inside foreach nested statement. One typical use case for nested foreach is after cogroup two relations, we want to flatten the records of the same key, and do some processing. This is naturally to be achieved by cross. Eg:

      C = cogroup user by uid, session by uid;
      D = foreach C {
          crossed = cross user, session; -- To flatten two input bags
          filtered = filter crossed by user::region == session::region;
          result = foreach crossed generate processSession(user::age, user::gender, session::ip);  --Nested foreach Jira: PIG-1631
          generate result;
      }
      

      If we don't have cross, user have to write a UDF process the bag user, session. It is much harder than a UDF process flattened tuples. This is especially true when we have nested foreach statement(PIG-1631).

      This is a candidate project for Google summer of code 2011. More information about the program can be found at http://wiki.apache.org/pig/GSoc2011

        Attachments

        1. PIG-1916_5.patch
          39 kB
          Daniel Dai
        2. PIG-1916_4.patch
          38 kB
          Zhijie Shen
        3. PIG-1916_3.patch
          31 kB
          Zhijie Shen
        4. PIG-1916_2.patch
          31 kB
          Zhijie Shen
        5. PIG-1916_1.patch
          10 kB
          Zhijie Shen

          Issue Links

            Activity

              People

              • Assignee:
                zjshen Zhijie Shen
                Reporter:
                daijy Daniel Dai
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: