Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-2587

Compute LogicalPlan signature and store in job conf

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.11
    • None
    • None

    Description

      We'd like to be able to uniquely identify a re-executed script (possibly with different inputs/outputs) by creating a signature of the LogicalPlan. Here's the proposal:

      1. Add a new method LogicalPlan.getSignature() that returns a hash of its LogicalPlanPrinter output.
      2. In PigServer.execute() set the signature on the job conf after the LP is compiled, but before it's executed.

      (1) would allow an impl of PigProgressNotificationListener.setScriptPlan() to save the LP signature with the script metadata. Upon subsequent runs (2) would allow an impl of PigReducerEstimator (see PIG-2574) to retrieve the current LP signature and fetch the historical data for the script. It could then use the previous run data to better estimate the number of reducers.

      Attachments

        1. pig-2587_1.patch
          2 kB
          William W. Graham Jr

        Issue Links

          Activity

            People

              billgraham William W. Graham Jr
              billgraham William W. Graham Jr
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: