Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-2099

Using Mahout as a Library in Spark Cluster

    XMLWordPrintableJSON

    Details

    • Type: Question
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: cooccurrence, Math
    • Labels:
      None
    • Environment:

      Spark version 2.3.0.2.6.5.10-2

       

      [EDIT] AP

      Description

      I have a Spark Cluster already setup, and this is the environment not in my direct control, but they do allow FAT JARs to be installed with the dependencies. I tried to package my Spark Application with some mahout code for SimilarityAnalysis, added Mahout library in POM file, and they are successfully packaged.

      The problem however is that I am getting this error while using existing Spark Context to build Distributed Spark Context for

      Mahout

      [EDIT]AP:

      pom.xml
      
      {...}
      
      dependency>
       <groupId>org.apache.mahout</groupId>
       <artifactId>mahout-math</artifactId>
       <version>0.13.0</version>
       </dependency>
       <dependency>
       <groupId>org.apache.mahout</groupId>
       <artifactId>mahout-math-scala_2.10</artifactId>
       <version>0.13.0</version>
       </dependency>
       <dependency>
       <groupId>org.apache.mahout</groupId>
       <artifactId>mahout-spark_2.10</artifactId>
       <version>0.13.0</version>
       </dependency>
       <dependency>
       <groupId>com.esotericsoftware</groupId>
       <artifactId>kryo</artifactId>
       <version>5.0.0-RC5</version>
       </dependency>
      
       

       

      Code:

      
      implicit val sc: SparkContext = sparkSession.sparkContext
      
      implicit val msc: SparkDistributedContext = sc2sdc(sc)
      
      Error:
      
      ERROR TaskSetManager: Task 7.0 in stage 10.0 (TID 58) had a not serializable result: org.apache.mahout.math.DenseVector
      
       
      
      And if I try to build the context using mahoutSparkContext() then its giving me the error that MAHOUT_HOME not found. 
      
      Code:
      
      implicit val msc = mahoutSparkContext(masterUrl = "local", appName = "CooccurrenceDriver")
      
      Error:
      
      MAHOUT_HOME is required to spawn mahout-based spark jobs
      
       

      My question is that how do I proceed in this situation? should I have to ask the administrators of the Spark environment to install Mahout library, or is there anyway I can proceed packaging my application as fat JAR. 

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              tariqjawed83 Tariq Jawed
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: