Pig
  1. Pig
  2. PIG-2659

add source location of the aliases in the physical plan

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.11
    • Component/s: impl
    • Labels:
      None
    • Patch Info:
      Patch Available

      Description

      The goal is to provide better information about what is actually running in a job.
      In particular when alias names are being reused.

      For example with the following script:

      A = LOAD 'foo' using PigStorage();
      B = GROUP A BY $0;
      A = FOREACH B GENERATE COUNT(A);
      STORE A INTO 'bar';
      

      The job conf will contain the following information

      pig.alias.location: M: A[1,4],A[3,4],B[2,4] C: A[3,4],B[2,4] R: A[3,4]
      

      A caveat is that the Logical Plan Optimizer throws away the original information when merging Logical Operators.
      this is already the case today with pig.alias

      1. PIG-2659.patch
        194 kB
        Julien Le Dem
      2. PIG-2659_a.patch
        41 kB
        Julien Le Dem
      3. PIG-2659_b.patch
        41 kB
        Julien Le Dem
      4. PIG-2659_c.patch
        52 kB
        Julien Le Dem
      5. PIG-2659_d.patch
        53 kB
        Julien Le Dem

        Activity

        Hide
        Julien Le Dem added a comment -

        PIG-2659_d.patch
        adds logging as in PIG-2688

        Show
        Julien Le Dem added a comment - PIG-2659 _d.patch adds logging as in PIG-2688
        Hide
        Julien Le Dem added a comment -

        PIG-2659_c.patch
        makes the field not transient anymore so that it will be accessible in the backend. this is necessary as PigStats deserializes the plan from the job conf.
        TestMRCompiler's golden files had to be updated accordingly when it should not have.
        I opened a JIRA for this (PIG-2698)

        Show
        Julien Le Dem added a comment - PIG-2659 _c.patch makes the field not transient anymore so that it will be accessible in the backend. this is necessary as PigStats deserializes the plan from the job conf. TestMRCompiler's golden files had to be updated accordingly when it should not have. I opened a JIRA for this ( PIG-2698 )
        Hide
        Julien Le Dem added a comment -

        PIG-2659_b.patch rebased patch

        Show
        Julien Le Dem added a comment - PIG-2659 _b.patch rebased patch
        Hide
        Julien Le Dem added a comment -

        PIG-2659_a.patch is removing the unnecessary whitespace changes

        Show
        Julien Le Dem added a comment - PIG-2659 _a.patch is removing the unnecessary whitespace changes
        Hide
        Daniel Dai added a comment -

        Great, +1.

        Show
        Daniel Dai added a comment - Great, +1.
        Hide
        Julien Le Dem added a comment -

        Hi Daniel,
        Yes my original motivation was to use this in the visualizer.
        Note that I made the location transient for now because some tests are comparing serialized physical plans to a saved binary versions, which is very brittle and not precise enough. I.E.: it fails whatever you changed to the physical plan serialized representation.

        explanation of the data
        M: A[1,4],A[3,4],B[2,4] C: A[3,4],B[2,4] R: A[3,4]
        M: Mapper plan contains the following aliases
        C: Combiner plan contains the following aliases
        R: reduce plane contains the following aliases
        each item is:
        alias[line,offset]
        This could also be split in 3 different properties:
        pig.alias.location.mapper: A[1,4],A[3,4],B[2,4]
        pig.alias.location.combiner: A[3,4],B[2,4]
        pig.alias.location.reducer: A[3,4]

        Yes, we should also make sure merged operators info is maintained.

        I will take care of the white spaces.

        There is a test here:
        test/org/apache/pig/newplan/logical/relational/TestLocationInPhysicalPlan.java

        Show
        Julien Le Dem added a comment - Hi Daniel, Yes my original motivation was to use this in the visualizer. Note that I made the location transient for now because some tests are comparing serialized physical plans to a saved binary versions, which is very brittle and not precise enough. I.E.: it fails whatever you changed to the physical plan serialized representation. explanation of the data M: A [1,4] ,A [3,4] ,B [2,4] C: A [3,4] ,B [2,4] R: A [3,4] M: Mapper plan contains the following aliases C: Combiner plan contains the following aliases R: reduce plane contains the following aliases each item is: alias [line,offset] This could also be split in 3 different properties: pig.alias.location.mapper: A [1,4] ,A [3,4] ,B [2,4] pig.alias.location.combiner: A [3,4] ,B [2,4] pig.alias.location.reducer: A [3,4] Yes, we should also make sure merged operators info is maintained. I will take care of the white spaces. There is a test here: test/org/apache/pig/newplan/logical/relational/TestLocationInPhysicalPlan.java
        Hide
        Daniel Dai added a comment -

        That's awesome, with this, we can do:

        1. Include line number for most backend exceptions
        2. Include line number in explain
        3. Include line number in stats
        4. Use line number to generate job name
        5. Make use of location in visualizer (PIG-2659)

        Can you elaborate what "M: A[1,4],A[3,4],B[2,4] C: A[3,4],B[2,4] R: A[3,4]" means?

        If we merge operators, we shall annotate how this operator come from, this can be done later.

        Also be sure to remove white space changes when you checkin (svn diff -x --ignore-all-space).

        Can you also add some tests?

        Show
        Daniel Dai added a comment - That's awesome, with this, we can do: 1. Include line number for most backend exceptions 2. Include line number in explain 3. Include line number in stats 4. Use line number to generate job name 5. Make use of location in visualizer ( PIG-2659 ) Can you elaborate what "M: A [1,4] ,A [3,4] ,B [2,4] C: A [3,4] ,B [2,4] R: A [3,4] " means? If we merge operators, we shall annotate how this operator come from, this can be done later. Also be sure to remove white space changes when you checkin (svn diff -x --ignore-all-space). Can you also add some tests?
        Hide
        Bill Graham added a comment -

        +1

        Looks good to me. There were a lot of whitespace changes in this patch FYI.

        Show
        Bill Graham added a comment - +1 Looks good to me. There were a lot of whitespace changes in this patch FYI.
        Hide
        Julien Le Dem added a comment -

        PIG-2659.patch implements this

        Show
        Julien Le Dem added a comment - PIG-2659 .patch implements this

          People

          • Assignee:
            Julien Le Dem
            Reporter:
            Julien Le Dem
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development