Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-1627

Problem with ALS Factorizer MapReduce version when working with oozie because of files in distributed cache. Error: Unable to read sequence file from cache.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Won't Fix
    • 0.10.2
    • 0.12.0
    • None
    • Hadoop

    Description

      There is a problem with ALS Factorizer when working with distributed environment and oozie.

      Steps:

      1) Built mahout 1.0 jars and picked mahout-mrlegacy jar.

      2) I have created a Java class in which i have called ParallelALSFactorizationJob with respective inputs.

      3) Submitted the job and there are list of Map Reduce jobs which got submitted to perform the factorization.

      4) Job failed at MultithreadedSharingMapper with the error Unable to read Sequnce file "<ourprogram>.jar" pointing the code at org.apache.mahout.cf.taste.hadoop.als.ALS and readMatrixByRowsFromDistributedCache method.

      Cause: The ALS class picks up input files which are sequential files from the distributed cache using readMatrixByRowsFromDistributedCache method. However, when we are working in oozie environment, the program jar as well being copied to distributed cache with input files. As the ALS class trying to read all the files in distributed cache, it is failing when it encounters jar.

      The remedy would be setting a condition to pick files those are other than jars.

      Attachments

        Issue Links

          Activity

            People

              smarthi Suneel Marthi
              srini.daruna Srinivasarao Daruna
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: