Details
-
New Feature
-
Status: Open
-
Major
-
Resolution: Unresolved
-
0.2.0
-
None
-
None
Description
A thin framework that can submit a MR job, run it and report results. Some thoughts:
- Most probably it will be a server-side daemon
- JSON over HTTP with REST semantics
- Functions - top level preliminary
- Accept a job and it's components at a well known URL
- Parse & create MR workflow
- Create & store a job context - ID, security artifacts et al
- Return a status URL (can be used to query status or kill the job) This is the REST model
- Run the job (might include dynamic elastic cloud provisioning for example OpenStack)
- As the job runs, collect and store in the job context
- If client queries return status
- Once job is done, store status and return results (most probably pointers to files and so forth)
- Calculate & store performance metrics
- Calculate & store charge back in generic units (eg: CPU,Memory,Network,storage
- As and when the client asks, return job results
- Some thoughts on implementation
- Store context et al in HBase
- A Clojure implementation ?
- Packaging like OVF ? (with embedded pointers to VM, data and so forth)
- For 1st release assume a homogeneous Hadoop infrastructure in a cloud
- Customer reporter/context counters?
- Distributed cache for framework artifacts and run time monitoring ?
- Most probably might have to use taskrunner ?
- Extend classes with submission framework setup and teardown code ?