Pig
  1. Pig
  2. PIG-435

wrong columns produced if incomplete definition provided during load

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Duplicate
    • Affects Version/s: 0.2.0
    • Fix Version/s: 0.9.0
    • Component/s: None
    • Labels:
      None

      Description

      Scrip:

      A = load 'studenttab10k' as (name); – note that data has more than 1 column
      B = load 'votertab10k' as (name, age, reg, contrib);
      D = COGROUP A by name, B by name;
      E = foreach D generate flatten(A), flatten(B);
      F = foreach E generate registration, contr;
      dump F;

      The dump produces the wrong columns. This is because even though we declared only one column, we actually load all columns of A. So any place where we explicitely or implicitely use A.* as the case in flatten, we would produce the wrong results.

      The long term solution is actually to push projections into the load. Shorter term the proposal is to notice if the script uses A.* and stick a project after the load. Note that we don't need to do that if types are declared because there will be already casting foreach there.

        Activity

        Hide
        Olga Natkovich added a comment -

        Needs further discussion

        Show
        Olga Natkovich added a comment - Needs further discussion
        Hide
        Olga Natkovich added a comment -

        This issue will be solved as part of the fix to https://issues.apache.org/jira/browse/PIG-1188

        Show
        Olga Natkovich added a comment - This issue will be solved as part of the fix to https://issues.apache.org/jira/browse/PIG-1188

          People

          • Assignee:
            Daniel Dai
            Reporter:
            Olga Natkovich
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development