Pig
  1. Pig
  2. PIG-2587

Compute LogicalPlan signature and store in job conf

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.11
    • Component/s: None
    • Labels:
      None

      Description

      We'd like to be able to uniquely identify a re-executed script (possibly with different inputs/outputs) by creating a signature of the LogicalPlan. Here's the proposal:

      1. Add a new method LogicalPlan.getSignature() that returns a hash of its LogicalPlanPrinter output.
      2. In PigServer.execute() set the signature on the job conf after the LP is compiled, but before it's executed.

      (1) would allow an impl of PigProgressNotificationListener.setScriptPlan() to save the LP signature with the script metadata. Upon subsequent runs (2) would allow an impl of PigReducerEstimator (see PIG-2574) to retrieve the current LP signature and fetch the historical data for the script. It could then use the previous run data to better estimate the number of reducers.

        Issue Links

          Activity

            People

            • Assignee:
              Bill Graham
              Reporter:
              Bill Graham
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development