Pig
  1. Pig
  2. PIG-3793

Provide info on number of LogicalRelationalOperator(s) used in the script through LogicalPlanData

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.13.0
    • Fix Version/s: 0.13.0
    • Component/s: data
    • Labels:
      None
    • Release Note:
      Exposes methods getNumLogicalRelationOperators(), getNumSources() and getNumSinks() on LogicalPlanData.

      Description

      Its useful to have an understanding of how many operators are being used in the script via the API. This could allow admins to enforce checks/restrictions on the length/complexity of the plan in user scripts.

      1. PIG-3793.patch
        7 kB
        Prashant Kommireddi
      2. PIG-3793_2.patch
        8 kB
        Prashant Kommireddi

        Activity

        Hide
        Kyungho Jeon added a comment -

        Okay, I was confused. As PigServer is the only interface that Pig users can use for submitting a query and Pig doesn't expose LogicalPlan, there's no way to mutate currDAG.lp. So now I believe the current getNumLogicalRelationOperators() implementation is reasonable and having resetLogicalPlanData() will be sufficient.

        Show
        Kyungho Jeon added a comment - Okay, I was confused. As PigServer is the only interface that Pig users can use for submitting a query and Pig doesn't expose LogicalPlan , there's no way to mutate currDAG.lp. So now I believe the current getNumLogicalRelationOperators() implementation is reasonable and having resetLogicalPlanData() will be sufficient.
        Hide
        Prashant Kommireddi added a comment -

        It's cool, that's how the project evolves. Ideas are always welcome

        Sounds like you have a use-case where PigServer.registerQuery(String) could be followed by a call to LogicalPlanData. getNumLogicalRelationOperators() - both interleaved and multiple times. Basically registering queries and seeing how LP changes through the various registers. Would it suffice to have a resetLogicalPlanData() method on PigServer that recomputes the LogicalPlanData ?

        Show
        Prashant Kommireddi added a comment - It's cool, that's how the project evolves. Ideas are always welcome Sounds like you have a use-case where PigServer.registerQuery(String) could be followed by a call to LogicalPlanData. getNumLogicalRelationOperators() - both interleaved and multiple times. Basically registering queries and seeing how LP changes through the various registers. Would it suffice to have a resetLogicalPlanData() method on PigServer that recomputes the LogicalPlanData ?
        Hide
        Kyungho Jeon added a comment -

        Thank you for the response. I found that LogicalPlanData instance is created and initialized when PigServer.getLogicalPlanData() is called. Then, my concern becomes less significant.

        But now I am thinking that LogicalPlanData should work as an interface (in a general meaning, not a Java interface) to LogicalPlan rather than a stateful object, considering that the class was added to avoid exposing LogicalPlan. It won't be a problem in current Pig implementation right now. But if someone (like me, actually) is playing with LogicalPlan and its optimizers, for example AddForEach, then the fact that LogicalPlanData computes and keeps the value might be confusing.

        I have been following Pig for a year but I am still a newbie. Please forgive my pestering. I am just trying to understand the implementation. Any comment will be appreciated!

        Show
        Kyungho Jeon added a comment - Thank you for the response. I found that LogicalPlanData instance is created and initialized when PigServer.getLogicalPlanData() is called. Then, my concern becomes less significant. But now I am thinking that LogicalPlanData should work as an interface (in a general meaning, not a Java interface) to LogicalPlan rather than a stateful object, considering that the class was added to avoid exposing LogicalPlan . It won't be a problem in current Pig implementation right now. But if someone (like me, actually) is playing with LogicalPlan and its optimizers, for example AddForEach , then the fact that LogicalPlanData computes and keeps the value might be confusing. I have been following Pig for a year but I am still a newbie. Please forgive my pestering. I am just trying to understand the implementation. Any comment will be appreciated!
        Hide
        Prashant Kommireddi added a comment -

        Kyungho Jeon thanks for bringing this up. I did think about this and went with the other approach of going through LogicalPlan at the time of LogicalPlanData's creation. The other approach would require us going through the LogicalPlan for each of the method calls which would be expensive. I agree it could be documented better

        Show
        Prashant Kommireddi added a comment - Kyungho Jeon thanks for bringing this up. I did think about this and went with the other approach of going through LogicalPlan at the time of LogicalPlanData's creation. The other approach would require us going through the LogicalPlan for each of the method calls which would be expensive. I agree it could be documented better
        Hide
        Kyungho Jeon added a comment -

        I guess it depends on how the class LogicalPlanData is used, but wouldn't it be safer to compute the number of LogicalRelationOperators (sources and sinks too) on the fly? Then, it doesn't need to assume that the LogicalPlan is not changed.

        For example, getNumLogicalRelationOperators() can be changed as follows:

        diff --git a/src/org/apache/pig/newplan/logical/relational/LogicalPlanData.java b/src/org/apache/pi
        index 17a2045..82df2a8 100644
        — a/src/org/apache/pig/newplan/logical/relational/LogicalPlanData.java
        +++ b/src/org/apache/pig/newplan/logical/relational/LogicalPlanData.java
        @@ -80,7 +80,16 @@ public class LogicalPlanData {

        • */
          public int getNumLogicalRelationOperators() {

        • return this.numLogicalRelationalOperators;
          + int numLogicalRelationalOperators = 0;
          + Iterator<Operator> ops = this.lp.getOperators();
          +
          + while (ops.hasNext()) { + Operator op = ops.next(); + if (op instanceof LogicalRelationalOperator) + numLogicalRelationalOperators++; + }

          +
          + return numLogicalRelationalOperators;
          }

        /**

        Show
        Kyungho Jeon added a comment - I guess it depends on how the class LogicalPlanData is used, but wouldn't it be safer to compute the number of LogicalRelationOperators (sources and sinks too) on the fly? Then, it doesn't need to assume that the LogicalPlan is not changed. For example, getNumLogicalRelationOperators() can be changed as follows: diff --git a/src/org/apache/pig/newplan/logical/relational/LogicalPlanData.java b/src/org/apache/pi index 17a2045..82df2a8 100644 — a/src/org/apache/pig/newplan/logical/relational/LogicalPlanData.java +++ b/src/org/apache/pig/newplan/logical/relational/LogicalPlanData.java @@ -80,7 +80,16 @@ public class LogicalPlanData { */ public int getNumLogicalRelationOperators() { return this.numLogicalRelationalOperators; + int numLogicalRelationalOperators = 0; + Iterator<Operator> ops = this.lp.getOperators(); + + while (ops.hasNext()) { + Operator op = ops.next(); + if (op instanceof LogicalRelationalOperator) + numLogicalRelationalOperators++; + } + + return numLogicalRelationalOperators; } /**
        Hide
        Prashant Kommireddi added a comment -

        Made the changes and committed to trunk. Thanks Cheolsoo Park

        Show
        Prashant Kommireddi added a comment - Made the changes and committed to trunk. Thanks Cheolsoo Park
        Hide
        Cheolsoo Park added a comment -

        +1.

        #1
        Prashant Kommireddi, can you add the following annotations to LogicalPlanData? I imagine we continue to improve it.

        @InterfaceAudience.Public
        @InterfaceStability.Evolving
        

        #2
        Please remove trailing white spaces when committing the patch. Thanks!

        Show
        Cheolsoo Park added a comment - +1. #1 Prashant Kommireddi , can you add the following annotations to LogicalPlanData? I imagine we continue to improve it. @InterfaceAudience.Public @InterfaceStability.Evolving #2 Please remove trailing white spaces when committing the patch. Thanks!

          People

          • Assignee:
            Prashant Kommireddi
            Reporter:
            Prashant Kommireddi
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development