Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-1569

Create CLI driver that supports Spark jobs

    Details

    • Type: New Feature
    • Status: Closed
    • Priority: Major
    • Resolution: Implemented
    • Affects Version/s: None
    • Fix Version/s: 0.10.0
    • Component/s: CLI
    • Labels:
    • Environment:

      Scala, Spark

      Description

      Create a design for CLI drivers, including an option parser, base MahoutDriver for Spark, that uses a text file I/O mechanism MAHOUT-1568

      A version of the proposal is implemented and running for ItemSimilarity on Spark. MAHOUT-1541

      A proposal is running with ItemSimilarity on Spark and is documented on the github wiki here: https://github.com/pferrel/harness/wiki

      Comments are appreciated

        Activity

        Hide
        pferrel Pat Ferrel added a comment -

        this is being merged into MAHOUT-1541. There are abstract MahoutDriver classes and an option parser.

        To further this a set of global options will be made available to the option parser so that, for instance, the schema for an output text delimited file can be specified on the CL with the same options in every driver by mixing in that group of options. Currently they must be reimplemented in each driver.

        Show
        pferrel Pat Ferrel added a comment - this is being merged into MAHOUT-1541 . There are abstract MahoutDriver classes and an option parser. To further this a set of global options will be made available to the option parser so that, for instance, the schema for an output text delimited file can be specified on the CL with the same options in every driver by mixing in that group of options. Currently they must be reimplemented in each driver.
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Mahout-Quality #2682 (See https://builds.apache.org/job/Mahout-Quality/2682/)
        MAHOUT-1561, MAHOUT-1568, MAHOUT-1569 text-delimited Spark readers and writers with drivers and a CLI for 'spark-itemsimilarity' closes apache/mahout#22 (pat: rev 2b65475c3ab682ebd47cffdc6b502698799cd2c8)

        • spark/src/main/scala/org/apache/mahout/drivers/MahoutOptionParser.scala
        • spark/src/main/scala/org/apache/mahout/drivers/FileSysUtils.scala
        • spark/src/main/scala/org/apache/mahout/cf/CooccurrenceAnalysis.scala
        • spark/pom.xml
        • spark/src/main/scala/org/apache/mahout/sparkbindings/io/MahoutKryoRegistrator.scala
        • spark/src/main/scala/org/apache/mahout/drivers/ItemSimilarityDriver.scala
        • spark/src/test/scala/org/apache/mahout/sparkbindings/test/MahoutLocalContext.scala
        • bin/mahout
        • spark/src/main/scala/org/apache/mahout/drivers/TextDelimitedReaderWriter.scala
        • spark/src/main/assembly/job.xml
        • spark/src/main/scala/org/apache/mahout/drivers/IndexedDataset.scala
        • spark/src/main/scala/org/apache/mahout/drivers/MahoutDriver.scala
        • spark/src/test/scala/org/apache/mahout/drivers/ItemSimilarityDriverSuite.scala
        • CHANGELOG
        • spark/src/test/scala/org/apache/mahout/cf/CooccurrenceAnalysisSuite.scala
        • spark/src/main/scala/org/apache/mahout/drivers/ReaderWriter.scala
        • spark/src/main/scala/org/apache/mahout/drivers/Schema.scala
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Mahout-Quality #2682 (See https://builds.apache.org/job/Mahout-Quality/2682/ ) MAHOUT-1561 , MAHOUT-1568 , MAHOUT-1569 text-delimited Spark readers and writers with drivers and a CLI for 'spark-itemsimilarity' closes apache/mahout#22 (pat: rev 2b65475c3ab682ebd47cffdc6b502698799cd2c8) spark/src/main/scala/org/apache/mahout/drivers/MahoutOptionParser.scala spark/src/main/scala/org/apache/mahout/drivers/FileSysUtils.scala spark/src/main/scala/org/apache/mahout/cf/CooccurrenceAnalysis.scala spark/pom.xml spark/src/main/scala/org/apache/mahout/sparkbindings/io/MahoutKryoRegistrator.scala spark/src/main/scala/org/apache/mahout/drivers/ItemSimilarityDriver.scala spark/src/test/scala/org/apache/mahout/sparkbindings/test/MahoutLocalContext.scala bin/mahout spark/src/main/scala/org/apache/mahout/drivers/TextDelimitedReaderWriter.scala spark/src/main/assembly/job.xml spark/src/main/scala/org/apache/mahout/drivers/IndexedDataset.scala spark/src/main/scala/org/apache/mahout/drivers/MahoutDriver.scala spark/src/test/scala/org/apache/mahout/drivers/ItemSimilarityDriverSuite.scala CHANGELOG spark/src/test/scala/org/apache/mahout/cf/CooccurrenceAnalysisSuite.scala spark/src/main/scala/org/apache/mahout/drivers/ReaderWriter.scala spark/src/main/scala/org/apache/mahout/drivers/Schema.scala
        Hide
        pferrel Pat Ferrel added a comment -

        First cut pushed. There is an option parser, a MahoutDriver class to extend, and an example in ItemSimilarityDriver.

        Next will be some DRYing of repeating code. Many of the CLI options will be repeated for many jobs so they need to be moved into shared code. Things like -i, -o, and a bunch of text file i/o format options are least. Looks like the MahoutOptionParser might be the place for this.

        Show
        pferrel Pat Ferrel added a comment - First cut pushed. There is an option parser, a MahoutDriver class to extend, and an example in ItemSimilarityDriver. Next will be some DRYing of repeating code. Many of the CLI options will be repeated for many jobs so they need to be moved into shared code. Things like -i, -o, and a bunch of text file i/o format options are least. Looks like the MahoutOptionParser might be the place for this.
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Mahout-Quality #2684 (See https://builds.apache.org/job/Mahout-Quality/2684/)
        MAHOUT-1541, MAHOUT-1568, MAHOUT-1569 fixed a build test problem, drivers have an option new to not search for MAHOUT_HOME and SPARK_HOME (pat: rev 32badb1d360ddf514e6b253f2dea9ae7e5df078a)

        • spark/src/main/scala/org/apache/mahout/drivers/MahoutDriver.scala
        • spark/src/main/scala/org/apache/mahout/drivers/ItemSimilarityDriver.scala
        • spark/src/test/scala/org/apache/mahout/drivers/ItemSimilarityDriverSuite.scala
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Mahout-Quality #2684 (See https://builds.apache.org/job/Mahout-Quality/2684/ ) MAHOUT-1541 , MAHOUT-1568 , MAHOUT-1569 fixed a build test problem, drivers have an option new to not search for MAHOUT_HOME and SPARK_HOME (pat: rev 32badb1d360ddf514e6b253f2dea9ae7e5df078a) spark/src/main/scala/org/apache/mahout/drivers/MahoutDriver.scala spark/src/main/scala/org/apache/mahout/drivers/ItemSimilarityDriver.scala spark/src/test/scala/org/apache/mahout/drivers/ItemSimilarityDriverSuite.scala
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Mahout-Quality #2733 (See https://builds.apache.org/job/Mahout-Quality/2733/)
        MAHOUT-1541, MAHOUT-1568, MAHOUT-1569 refactoring the options parser and option defaults to DRY up individual driver code putting more in base classes, tightened up the test suite with a better way of comparing actual with correct (pat: rev a80974037853c5227f9e5ef1c384a1fca134746e)

        • math-scala/src/main/scala/org/apache/mahout/math/cf/CooccurrenceAnalysis.scala
        • spark/src/main/scala/org/apache/mahout/drivers/ReaderWriter.scala
        • spark/src/main/scala/org/apache/mahout/sparkbindings/io/MahoutKryoRegistrator.scala
        • spark/src/main/scala/org/apache/mahout/drivers/MahoutOptionParser.scala
        • spark/src/main/scala/org/apache/mahout/drivers/IndexedDataset.scala
        • spark/src/main/scala/org/apache/mahout/drivers/MahoutDriver.scala
        • spark/src/main/scala/org/apache/mahout/cf/CooccurrenceAnalysis.scala
        • spark/src/main/scala/org/apache/mahout/sparkbindings/drm/CheckpointedDrmSpark.scala
        • spark/src/main/scala/org/apache/mahout/drivers/TextDelimitedReaderWriter.scala
        • spark/src/test/scala/org/apache/mahout/drivers/ItemSimilarityDriverSuite.scala
        • spark/src/main/scala/org/apache/mahout/drivers/ItemSimilarityDriver.scala
        • spark/src/main/scala/org/apache/mahout/drivers/Schema.scala
        • spark/src/test/scala/org/apache/mahout/cf/CooccurrenceAnalysisSuite.scala
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Mahout-Quality #2733 (See https://builds.apache.org/job/Mahout-Quality/2733/ ) MAHOUT-1541 , MAHOUT-1568 , MAHOUT-1569 refactoring the options parser and option defaults to DRY up individual driver code putting more in base classes, tightened up the test suite with a better way of comparing actual with correct (pat: rev a80974037853c5227f9e5ef1c384a1fca134746e) math-scala/src/main/scala/org/apache/mahout/math/cf/CooccurrenceAnalysis.scala spark/src/main/scala/org/apache/mahout/drivers/ReaderWriter.scala spark/src/main/scala/org/apache/mahout/sparkbindings/io/MahoutKryoRegistrator.scala spark/src/main/scala/org/apache/mahout/drivers/MahoutOptionParser.scala spark/src/main/scala/org/apache/mahout/drivers/IndexedDataset.scala spark/src/main/scala/org/apache/mahout/drivers/MahoutDriver.scala spark/src/main/scala/org/apache/mahout/cf/CooccurrenceAnalysis.scala spark/src/main/scala/org/apache/mahout/sparkbindings/drm/CheckpointedDrmSpark.scala spark/src/main/scala/org/apache/mahout/drivers/TextDelimitedReaderWriter.scala spark/src/test/scala/org/apache/mahout/drivers/ItemSimilarityDriverSuite.scala spark/src/main/scala/org/apache/mahout/drivers/ItemSimilarityDriver.scala spark/src/main/scala/org/apache/mahout/drivers/Schema.scala spark/src/test/scala/org/apache/mahout/cf/CooccurrenceAnalysisSuite.scala
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Mahout-Quality #2768 (See https://builds.apache.org/job/Mahout-Quality/2768/)
        MAHOUT-1604 add a CLI and associated code for spark-rowsimilarity, also cleans up some things in MAHOUT-1568 and MAHOUT-1569, closes apache/mahout#47 (pat: rev 149c98592fe447c98dfb5afc67b5809725cc3056)

        • spark/pom.xml
        • spark/src/main/scala/org/apache/mahout/drivers/RowSimilarityDriver.scala
        • CHANGELOG
        • spark/src/main/scala/org/apache/mahout/drivers/ReaderWriter.scala
        • spark/src/main/scala/org/apache/mahout/drivers/TextDelimitedReaderWriter.scala
        • spark/src/main/scala/org/apache/mahout/drivers/MahoutDriver.scala
        • math-scala/src/main/scala/org/apache/mahout/math/scalabindings/MatrixOps.scala
        • spark/src/main/scala/org/apache/mahout/drivers/FileSysUtils.scala
        • spark/src/test/scala/org/apache/mahout/cf/CooccurrenceAnalysisSuite.scala
        • spark/src/test/scala/org/apache/mahout/drivers/RowSimilarityDriverSuite.scala
        • math-scala/src/main/scala/org/apache/mahout/math/drm/RLikeDrmOps.scala
        • math-scala/src/main/scala/org/apache/mahout/math/cf/CooccurrenceAnalysis.scala
        • bin/mahout
        • spark/src/main/scala/org/apache/mahout/sparkbindings/SparkEngine.scala
        • spark/src/main/scala/org/apache/mahout/drivers/MahoutOptionParser.scala
        • spark/src/main/scala/org/apache/mahout/drivers/IndexedDataset.scala
        • spark/src/main/scala/org/apache/mahout/drivers/Schema.scala
        • spark/src/main/scala/org/apache/mahout/drivers/ItemSimilarityDriver.scala
        • math-scala/src/main/scala/org/apache/mahout/math/cf/SimilarityAnalysis.scala
        • math-scala/src/test/scala/org/apache/mahout/math/scalabindings/MatrixOpsSuite.scala
        • spark/src/test/scala/org/apache/mahout/drivers/ItemSimilarityDriverSuite.scala
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Mahout-Quality #2768 (See https://builds.apache.org/job/Mahout-Quality/2768/ ) MAHOUT-1604 add a CLI and associated code for spark-rowsimilarity, also cleans up some things in MAHOUT-1568 and MAHOUT-1569 , closes apache/mahout#47 (pat: rev 149c98592fe447c98dfb5afc67b5809725cc3056) spark/pom.xml spark/src/main/scala/org/apache/mahout/drivers/RowSimilarityDriver.scala CHANGELOG spark/src/main/scala/org/apache/mahout/drivers/ReaderWriter.scala spark/src/main/scala/org/apache/mahout/drivers/TextDelimitedReaderWriter.scala spark/src/main/scala/org/apache/mahout/drivers/MahoutDriver.scala math-scala/src/main/scala/org/apache/mahout/math/scalabindings/MatrixOps.scala spark/src/main/scala/org/apache/mahout/drivers/FileSysUtils.scala spark/src/test/scala/org/apache/mahout/cf/CooccurrenceAnalysisSuite.scala spark/src/test/scala/org/apache/mahout/drivers/RowSimilarityDriverSuite.scala math-scala/src/main/scala/org/apache/mahout/math/drm/RLikeDrmOps.scala math-scala/src/main/scala/org/apache/mahout/math/cf/CooccurrenceAnalysis.scala bin/mahout spark/src/main/scala/org/apache/mahout/sparkbindings/SparkEngine.scala spark/src/main/scala/org/apache/mahout/drivers/MahoutOptionParser.scala spark/src/main/scala/org/apache/mahout/drivers/IndexedDataset.scala spark/src/main/scala/org/apache/mahout/drivers/Schema.scala spark/src/main/scala/org/apache/mahout/drivers/ItemSimilarityDriver.scala math-scala/src/main/scala/org/apache/mahout/math/cf/SimilarityAnalysis.scala math-scala/src/test/scala/org/apache/mahout/math/scalabindings/MatrixOpsSuite.scala spark/src/test/scala/org/apache/mahout/drivers/ItemSimilarityDriverSuite.scala
        Hide
        Andrew_Palumbo Andrew Palumbo added a comment -

        Pat Ferrel this can be closed, right?

        Show
        Andrew_Palumbo Andrew Palumbo added a comment - Pat Ferrel this can be closed, right?
        Hide
        pferrel Pat Ferrel added a comment -

        Still not completely satisfied with this. The issue is the need for lots of casts of CLI params but this is more aesthetic than anything else so closing.

        Show
        pferrel Pat Ferrel added a comment - Still not completely satisfied with this. The issue is the need for lots of casts of CLI params but this is more aesthetic than anything else so closing.
        Hide
        pferrel Pat Ferrel added a comment -

        Working

        Show
        pferrel Pat Ferrel added a comment - Working
        Hide
        sslavic Stevo Slavic added a comment -

        Bulk closing all 0.10.0 resolved issues

        Show
        sslavic Stevo Slavic added a comment - Bulk closing all 0.10.0 resolved issues

          People

          • Assignee:
            pferrel Pat Ferrel
            Reporter:
            pferrel Pat Ferrel
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development