Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-2587

Compute LogicalPlan signature and store in job conf

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.11
    • Component/s: None
    • Labels:
      None

      Description

      We'd like to be able to uniquely identify a re-executed script (possibly with different inputs/outputs) by creating a signature of the LogicalPlan. Here's the proposal:

      1. Add a new method LogicalPlan.getSignature() that returns a hash of its LogicalPlanPrinter output.
      2. In PigServer.execute() set the signature on the job conf after the LP is compiled, but before it's executed.

      (1) would allow an impl of PigProgressNotificationListener.setScriptPlan() to save the LP signature with the script metadata. Upon subsequent runs (2) would allow an impl of PigReducerEstimator (see PIG-2574) to retrieve the current LP signature and fetch the historical data for the script. It could then use the previous run data to better estimate the number of reducers.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                billgraham Bill Graham
                Reporter:
                billgraham Bill Graham
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: