Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-1636

Class dependencies for the spark module are put in a job.jar, which is very inefficient

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.9
    • Fix Version/s: 0.10.0
    • Component/s: spark
    • Labels:

      Description

      using a maven plugin and an assembly job.xml a job.jar is created with all dependencies including transitive ones. This job.jar is in mahout/spark/target and is included in the classpath when a Spark job is run. This allows dependency classes to be found at runtime but the job.jar include a great deal of things not needed that are duplicates of classes found in the main mrlegacy job.jar. If the job.jar is removed, drivers will not find needed classes. A better way needs to be implemented for including class dependencies.

      I'm not sure what that better way is so am leaving the assembly alone for now. Whoever picks up this Jira will have to remove it after deciding on a better method.

        Attachments

          Activity

            People

            • Assignee:
              pferrel Pat Ferrel
              Reporter:
              pferrel Pat Ferrel
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: