Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
Currently the shim layer requires recompiling Pig with one version of Hadoop or the other.
The drawback is that users must use the version of Pig corresponding to their cluster.
Also the current shim layer reuses the same class names which makes it difficult to maintain and debug.
Another goal of this refactoring is enabling modularization of the build. The current mechanism requires compiling everything together.
Instead we should do as follow:
- Define the Shim interface (pig-shim-api) that Pig (pig-core) depends on for compilation.
- Have an implementation for each supported version of Hadoop with a different class name (pig-shim-hadoop20 and pig-shim-hadoop23). those would depend on the shim interface and the corresponding version of hadoop but not Pig.
- Then provide a jar (pig-without-hadoop) containing pig, the shim interface, the implementations. At runtime determine which shim implementation to run.
summary of dependencies:
pig-shim-api (no or limited dependencies)
pig-shim-hadoop20 -> pig-shim-api, hadoop23
pig-shim-hadoop23 -> pig-shim-api, hadoop20
pig-core -> pig-shim-api (and existing dependencies)
// tests can be run against one or the other.
pig-core-tests -> pig-core, pig-shim-api, pig-shim-hadoop20, pig-shim-hadoop23, hadoop
pig-without-hadoop -> pig-core, pig-shim-api, pig-shim-hadoop20, pig-shim-hadoop23
Attachments
Issue Links
- is depended upon by
-
PIG-2599 Mavenize Pig
- Open