Pig
  1. Pig
  2. PIG-2659

add source location of the aliases in the physical plan

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.11
    • Component/s: impl
    • Labels:
      None
    • Patch Info:
      Patch Available

      Description

      The goal is to provide better information about what is actually running in a job.
      In particular when alias names are being reused.

      For example with the following script:

      A = LOAD 'foo' using PigStorage();
      B = GROUP A BY $0;
      A = FOREACH B GENERATE COUNT(A);
      STORE A INTO 'bar';
      

      The job conf will contain the following information

      pig.alias.location: M: A[1,4],A[3,4],B[2,4] C: A[3,4],B[2,4] R: A[3,4]
      

      A caveat is that the Logical Plan Optimizer throws away the original information when merging Logical Operators.
      this is already the case today with pig.alias

      1. PIG-2659.patch
        194 kB
        Julien Le Dem
      2. PIG-2659_d.patch
        53 kB
        Julien Le Dem
      3. PIG-2659_c.patch
        52 kB
        Julien Le Dem
      4. PIG-2659_b.patch
        41 kB
        Julien Le Dem
      5. PIG-2659_a.patch
        41 kB
        Julien Le Dem

        Activity

        No work has yet been logged on this issue.

          People

          • Assignee:
            Julien Le Dem
            Reporter:
            Julien Le Dem
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development