Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-2659

add source location of the aliases in the physical plan

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.11
    • Component/s: impl
    • Labels:
      None
    • Patch Info:
      Patch Available

      Description

      The goal is to provide better information about what is actually running in a job.
      In particular when alias names are being reused.

      For example with the following script:

      A = LOAD 'foo' using PigStorage();
      B = GROUP A BY $0;
      A = FOREACH B GENERATE COUNT(A);
      STORE A INTO 'bar';
      

      The job conf will contain the following information

      pig.alias.location: M: A[1,4],A[3,4],B[2,4] C: A[3,4],B[2,4] R: A[3,4]
      

      A caveat is that the Logical Plan Optimizer throws away the original information when merging Logical Operators.
      this is already the case today with pig.alias

        Attachments

        1. PIG-2659.patch
          194 kB
          Julien Le Dem
        2. PIG-2659_a.patch
          41 kB
          Julien Le Dem
        3. PIG-2659_b.patch
          41 kB
          Julien Le Dem
        4. PIG-2659_c.patch
          52 kB
          Julien Le Dem
        5. PIG-2659_d.patch
          53 kB
          Julien Le Dem

          Activity

            People

            • Assignee:
              julienledem Julien Le Dem
              Reporter:
              julienledem Julien Le Dem
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: