Pig
  1. Pig
  2. PIG-2587

Compute LogicalPlan signature and store in job conf

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.11
    • Component/s: None
    • Labels:
      None

      Description

      We'd like to be able to uniquely identify a re-executed script (possibly with different inputs/outputs) by creating a signature of the LogicalPlan. Here's the proposal:

      1. Add a new method LogicalPlan.getSignature() that returns a hash of its LogicalPlanPrinter output.
      2. In PigServer.execute() set the signature on the job conf after the LP is compiled, but before it's executed.

      (1) would allow an impl of PigProgressNotificationListener.setScriptPlan() to save the LP signature with the script metadata. Upon subsequent runs (2) would allow an impl of PigReducerEstimator (see PIG-2574) to retrieve the current LP signature and fetch the historical data for the script. It could then use the previous run data to better estimate the number of reducers.

        Issue Links

          Activity

          Hide
          Bill Graham added a comment -

          Here's a first pass of a proposed implementation.

          Show
          Bill Graham added a comment - Here's a first pass of a proposed implementation.
          Hide
          Jonathan Coveney added a comment -

          Bill,

          There are a couple of ways to implement a signature like this. One is to just do the hashCode, which is what you did...that will be good for identical scripts. I wonder if it might be worth thinking about some sort of value that wouldn't change with cosmetic changes to the script (ie alias changes and the like)? I guess a signature is one thing, and the hashCode would be adequate, but ideally as long as the sources and transformations are the same, you'd want cosmetic changes not to throw out the tuning you've done.

          Is that crazy talk? 80/20 may dictate just going with this approach since it is so simple and saving the bigger optimization for external systems.

          Show
          Jonathan Coveney added a comment - Bill, There are a couple of ways to implement a signature like this. One is to just do the hashCode, which is what you did...that will be good for identical scripts. I wonder if it might be worth thinking about some sort of value that wouldn't change with cosmetic changes to the script (ie alias changes and the like)? I guess a signature is one thing, and the hashCode would be adequate, but ideally as long as the sources and transformations are the same, you'd want cosmetic changes not to throw out the tuning you've done. Is that crazy talk? 80/20 may dictate just going with this approach since it is so simple and saving the bigger optimization for external systems.
          Hide
          Julien Le Dem added a comment -

          @Jonathan I think getting the signature exactly right would be hard with the extra issue that every change to improve the signature instantly invalidates any cache based on the signature. The case where the script is modified in a way that doesn't change anything to the physical plan seems marginal.

          This looks good to me.

          Outside of the scope of this patch: Things that impact the physical plan as well and should probably be used as part of the look up:

          • version of Pig
          • optimizer flags
          • version of registered jars
          Show
          Julien Le Dem added a comment - @Jonathan I think getting the signature exactly right would be hard with the extra issue that every change to improve the signature instantly invalidates any cache based on the signature. The case where the script is modified in a way that doesn't change anything to the physical plan seems marginal. This looks good to me. Outside of the scope of this patch: Things that impact the physical plan as well and should probably be used as part of the look up: version of Pig optimizer flags version of registered jars
          Hide
          Bill Graham added a comment -

          I agree if cosmetic changes happen to the script, all bets are off and you'll get a different signature.

          Also agree about the 3 items out of scope here. The version of registered jars part would be ugly due to potential transitive dependancies changing and not being detected.

          Show
          Bill Graham added a comment - I agree if cosmetic changes happen to the script, all bets are off and you'll get a different signature. Also agree about the 3 items out of scope here. The version of registered jars part would be ugly due to potential transitive dependancies changing and not being detected.
          Hide
          Julien Le Dem added a comment -

          +1

          Show
          Julien Le Dem added a comment - +1
          Hide
          Bill Graham added a comment -

          Committed.

          Show
          Bill Graham added a comment - Committed.

            People

            • Assignee:
              Bill Graham
              Reporter:
              Bill Graham
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development