Details
-
New Feature
-
Status: Resolved
-
Critical
-
Resolution: Won't Fix
-
14.1
-
None
-
None
Description
Have a [WIP] Dockerfile for which (assuming a binary release,) pulls the appropriate version of Spark and places both Spark and Mahout in /opt/spark and /opt/mahout respectively.
Would like to add full build mahout build capabilities (this should not be difficult) in a second file.
these files currently use an ENTRYPOINT["entrypoint.sh"] command and some environment variables (none uncommon to Spark or Mahout aside from a $MAHOUT_CLASSPATH env variable).
the entrypiont.sh essentially. cheeks to see if the command is form a worker or a driver, and runs as such. Currently I'm just dumping the entire $MAHOUT_HOME/lib/*.jar into the $MAHOUT_CLASSPATH and adding it to the SPARK_CLASSPATH.
If the entrypoint.sh file detects a driver. it will launch spark-submit. IIRC, which I so not think that I do, spark submit can handle any driver pferrel Does this sound correct. Otherwise we just add the mahout class to be passed to spark-submit class as a command parameter.
Though this may be better to migrate in 14.2 or 15.0 to an entire new build-chain. E.g. CMake. (I would suggest) given our large amount of native code (hopefully soon to be added )
Though its nearly finished may want to punt this Dockerfile for 14.1, or mark it Experimental, is likely a better option.
Attachments
Issue Links
- is related to
-
MAHOUT-2025 Add a script to launch an EC2 cluster, install Mahout and JCuda or ViennaCL to Examples
- Resolved