[PIG-2686] refactor shim layer to have a universal Pig jar working on all supported Hadoop versions - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: impl
Labels:
None

Description

Currently the shim layer requires recompiling Pig with one version of Hadoop or the other.
The drawback is that users must use the version of Pig corresponding to their cluster.
Also the current shim layer reuses the same class names which makes it difficult to maintain and debug.
Another goal of this refactoring is enabling modularization of the build. The current mechanism requires compiling everything together.

Instead we should do as follow:

Define the Shim interface (pig-shim-api) that Pig (pig-core) depends on for compilation.
Have an implementation for each supported version of Hadoop with a different class name (pig-shim-hadoop20 and pig-shim-hadoop23). those would depend on the shim interface and the corresponding version of hadoop but not Pig.
Then provide a jar (pig-without-hadoop) containing pig, the shim interface, the implementations. At runtime determine which shim implementation to run.

summary of dependencies:
pig-shim-api (no or limited dependencies)
pig-shim-hadoop20 -> pig-shim-api, hadoop23
pig-shim-hadoop23 -> pig-shim-api, hadoop20
pig-core -> pig-shim-api (and existing dependencies)
// tests can be run against one or the other.
pig-core-tests -> pig-core, pig-shim-api, pig-shim-hadoop20, pig-shim-hadoop23, hadoop

{version}

pig-without-hadoop -> pig-core, pig-shim-api, pig-shim-hadoop20, pig-shim-hadoop23

Attachments

Issue Links

is depended upon by

PIG-2599 Mavenize Pig

Open

Activity

People

Assignee:: Unassigned

Reporter:: Julien Le Dem

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 07/May/12 19:12

Updated:: 02/May/13 02:29