Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
0.14.0, 0.13.2, 14.1
-
None
Description
Seems like newer versions of Mahout do have problems with spark bindings e.g. mahout spark-itemsimilarity or mahout spark-rowsimilarity do not work due to class not found exceptions.
Error: Could not find or load main class org.apache.mahout.drivers.RowSimilarityDriver
Error: Could not find or load main class org.apache.mahout.drivers.ItemSimilarityDriver
whereas mahout spark-shell works flawlessly.
Here is a short Dockerfile to show the issue:
FROM openjdk:8-alpine ENV spark_uid=185 ENV SCALA_MAJOR=2.11 ENV SCALA_MAJOR_MINOR=2.11.12 ENV HADOOP_MAJOR=2.7 ENV SPARK_MAJOR_MINOR=2.4.5 ENV MAHOUT_MAJOR_MINOR=0.14.0 ENV MAHOUT_VERSION=mahout-${MAHOUT_MAJOR_MINOR} ENV MAHOUT_BASE=/opt/mahout ENV MAHOUT_HOME=${MAHOUT_BASE}/${MAHOUT_VERSION} ENV SPARK_VERSION=spark-${SPARK_MAJOR_MINOR} ENV SPARK_BASE=/opt/spark ENV SPARK_HOME=${SPARK_BASE}/${SPARK_VERSION} ENV MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=1g" ENV SPARK_SRC_URL="https://archive.apache.org/dist/spark/${SPARK_VERSION}/${SPARK_VERSION}.tgz" ENV MAHOUT_SRC_URL="https://archive.apache.org/dist/mahout/${MAHOUT_MAJOR_MINOR}/mahout-${MAHOUT_MAJOR_MINOR}-source-release.zip" ENV ZINC_PORT=3030 ### build spark RUN set -ex && \ apk upgrade --no-cache && \ ln -s /lib /lib64 && \ apk add --no-cache bash python py-pip tini libc6-compat linux-pam krb5 krb5-libs nss curl openssl git maven && \ pip install setuptools && \ mkdir -p ${MAHOUT_HOME} && \ mkdir -p ${SPARK_BASE} && \ curl -LfsS ${SPARK_SRC_URL} -o ${SPARK_HOME}.tgz && \ tar -xzvf ${SPARK_HOME}.tgz -C ${SPARK_BASE}/ && \ rm ${SPARK_HOME}.tgz && \ export PATH=$PATH:$MAHOUT_HOME/bin:$MAHOUT_HOME/lib:$SPARK_HOME/bin:$JAVA_HOME/bin && \ bash ${SPARK_HOME}/dev/change-scala-version.sh ${SCALA_MAJOR} && \ bash ${SPARK_HOME}/dev/make-distribution.sh --name ${DATE}-${REVISION} --pip --tgz -DzincPort=${ZINC_PORT} \ -Phadoop-${HADOOP_MAJOR} -Pkubernetes -Pkinesis-asl -Phive -Phive-thriftserver -Pscala-${SCALA_MAJOR} ### build mahout RUN curl -LfsS $MAHOUT_SRC_URL -o ${MAHOUT_BASE}.zip && \ unzip ${MAHOUT_BASE}.zip -d ${MAHOUT_BASE} && \ rm ${MAHOUT_BASE}.zip && \ cd ${MAHOUT_HOME} && \ mvn -Dspark.version=${SPARK_MAJOR_MINOR} -Dscala.version=${SCALA_MAJOR_MINOR} -Dscala.compat.version=${SCALA_MAJOR} -DskipTests -Dmaven.javadoc.skip=true clean package
docker build . -t mahout-test
docker run -it mahout-test /bin/bash
Attachments
Attachments
Issue Links
- Dependency
-
MAHOUT-2102 transitive and direct dependencies not being piched up in ~/lib by /bin/mahout
- Resolved
- is fixed by
-
MAHOUT-2100 Dependencies (Scopt e.g.) are not being picked on the classpath nor shipped in ./lib
- Resolved
- is related to
-
MAHOUT-2023 Drivers broken, scopt classes not found
- Resolved