Currently Oozie share lib is a single directory with the the JARs required by the different action types (mapreduce-streaming & pig).
As more action types are added to Oozie (ie Hive & Sqoop), the Oozie share lib will grow significantly in the number of JARs it has.
This creates a few issues:
- The classpath for an action grows significantly
- For a given action only a portion of the classpath is useful, the rest is dead code
- As more actions are added, the chances of conflicting dependencies grows.
As I'm working on integrating Hive action (
OOZIE-68), I'm running into the issue described in the #3 bullet item. Pig 0.9.0 requires antlr-runtime 3.4 to work properly, while Hive requires antlr-runtime 3.0.1 to work properly.
Because the current sharelib aggregates all dependencies and resolves into a single version of each one of them, only one version of antlr-runtime makes it. And because of this, either Pig or Hive works but not both.
This JIRA proposes to add one subdirectory per action (when the action requires specific JARs) to the Oozie share lib, for example:
Then, the ActionExecutor for each action type will add to the action classpath all the JARs for the corresponding action only.
This would move the resolution for the Oozie share lib from the submit-command to the action-executor.
Note that this change will not break workflow applications using Oozie system share library as the change will be transparent to applications.
Finally, the sharelib maven submodule becomes an aggregator for the sharelibs for each action.