Details
-
Wish
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
None
-
None
-
None
Description
Today one can schedule workflows which run on a cluster using many pre-existing artifacts. However, today there are no helpful primitives for deploying those artifacts to the cluster.
For example, one may use a Spark JAR (hosted in a Maven repository) stored on a cluster's HDFS to talk with data stored in a Hive table (the schema of which is likely tracked in a source code management system somewhere) and use JDBC to talk to an arbitrary database off the cluster. (As to which database being configured based on the cluster being Dev, Beta or PROD all mapped in a simple configuration file.) Further, the data may be verified by a simple Pig script (also stored in a source code management repository).
As a user, I'd like some way to get my binaries and ASCII configuration on a cluster without significant ad hoc shell actions.
Attachments
1.
|
Git action | Closed | Clay B. | |
2.
|
Oozie Maven Action | Open | Unassigned |