Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-2723

Pig fails when pig.jar is removed during a job

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      Many pig scripts are executed as a series of MR jobs. When submitting each MR job to the cluster, pig makes a single fat job.jar that contains pig itself, along with all registered jars (UDFs and their dependencies).

      This becomes problematic when pig is upgraded during a long-running job. For example, going from pig-0.9.2+1.jar to pig-0.9.2+2.jar. When creating the next job.jar pig will fail because the expected pig jar is no longer available.

      A common case where this happens is deploying a new pig RPM.

      Pig should handle the case where its jar is removed while executing a script.

      DISCUSSED OPTIONS THAT SEEM PROBLEMATIC:

      • Creating a single pig.jar symlink that points at the installed pig version could cause MR jobs to use different pig versions during the same script. This could lead to very difficult to debug issues, and potential correctness issues.
      • Extracting pig.jar once for the whole job could be problematic if /tmp is used and something like tmpwatch runs.

      POSSIBLE SOLUTION:

      Pig could put pig.jar in the distributed cache once at reuse that jar on HDFS for all launched jobs.

      WHY ARE YOU DELETING PIG.JAR DURING THE JOB!?!?

      Allowing RPM upgrades mid-pig-job means the machine does not need to be drained for maintenance, reducing the impact of upgrades. Having just one pig version installed simplifies packaging and for users to choose the right version. Overall it just keeps things simple, which is a feature itself.

      Attachments

        Activity

          People

            Unassigned Unassigned
            traviscrawford Travis Crawford
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: