Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-2659

add source location of the aliases in the physical plan

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.11
    • impl
    • None
    • Patch Available

    Description

      The goal is to provide better information about what is actually running in a job.
      In particular when alias names are being reused.

      For example with the following script:

      A = LOAD 'foo' using PigStorage();
      B = GROUP A BY $0;
      A = FOREACH B GENERATE COUNT(A);
      STORE A INTO 'bar';
      

      The job conf will contain the following information

      pig.alias.location: M: A[1,4],A[3,4],B[2,4] C: A[3,4],B[2,4] R: A[3,4]
      

      A caveat is that the Logical Plan Optimizer throws away the original information when merging Logical Operators.
      this is already the case today with pig.alias

      Attachments

        1. PIG-2659.patch
          194 kB
          Julien Le Dem
        2. PIG-2659_d.patch
          53 kB
          Julien Le Dem
        3. PIG-2659_c.patch
          52 kB
          Julien Le Dem
        4. PIG-2659_b.patch
          41 kB
          Julien Le Dem
        5. PIG-2659_a.patch
          41 kB
          Julien Le Dem

        Activity

          People

            julienledem Julien Le Dem
            julienledem Julien Le Dem
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: