Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-578

join ... outer, ... outer semantics are a no-ops, should produce corresponding null values

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.2.0
    • 0.4.0
    • impl
    • None
    • Reviewed

    Description

      Currently using the "OUTER" modifier in the JOIN statement is a no-op. The resuls of JOIN are always an INNER join. Now that the Pig types branch supports null values proper, the semantics of JOIN ... OUTER, ... OUTER should be corrected to do proper outer joins and populating the corresponding empty values with nulls.

      Here's the example:

      A = load 'a.txt' using PigStorage() as ( comment, value );
      B = load 'b.txt' using PigStorage() as ( comment, value );

      -- OUTER clause is ignored in JOIN statement and does not populat tuple with
      – null values as it should. Otherwise OUTER is a meaningless no-op modifier.

      ABOuterJoin = join A by ( comment ) outer, B by ( comment ) outer;
      describe ABOuterJoin;
      dump ABOuterJoin;

      The file a contains:
      a-only 1
      ab-both 2

      The file b contains:
      ab-both 2
      b-only 3

      When you execute the script today, the dump results are:

      (ab-both,2,ab-both,2)

      The expected dump results should be:

      (a-only,1,,)
      (ab-both,2,ab-both,2)
      (,,b-only,3)

      Attachments

        1. PIG-578-2.patch
          37 kB
          Pradeep Kamath
        2. PIG-578.patch
          37 kB
          Pradeep Kamath

        Activity

          People

            pkamath Pradeep Kamath
            ciemo David Ciemiewicz
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: