Mahout
  1. Mahout
  2. MAHOUT-798

Add Examples for the ASF Mail Archive

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.6
    • Component/s: None
    • Labels:
      None

      Description

      Per http://www.lucidimagination.com/search/document/c6ea889edb9ad0fe/email_and_collab_filtering, I am working on a variety of examples based on the ASF email archive. WIP will be at https://github.com/lucidimagination/mahout.

      I intend to have at least three examples, one for classification, clustering and collab filtering.

        Activity

        Hide
        Grant Ingersoll added a comment -

        Added in examples of clustering, classification and recommendation using the ASF data set. Also added the ability to dump out clusters to files in various formats, as well as a pluggable Writer approach for doing that. Made various other refactorings.

        Show
        Grant Ingersoll added a comment - Added in examples of clustering, classification and recommendation using the ASF data set. Also added the ability to dump out clusters to files in various formats, as well as a pluggable Writer approach for doing that. Made various other refactorings.
        Hide
        Hudson added a comment -

        Integrated in Mahout-Quality #1085 (See https://builds.apache.org/job/Mahout-Quality/1085/)
        MAHOUT-798: add in examples for working with ASF email archive, plus various refactorings to clusterdumper, etc. for viewing results

        gsingers : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1180043
        Files :

        • /mahout/trunk/bin/mahout
        • /mahout/trunk/buildtools
        • /mahout/trunk/core
        • /mahout/trunk/core/src/main/java/org/apache/mahout/classifier/naivebayes/AbstractNaiveBayesClassifier.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/classifier/naivebayes/BayesUtils.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/classifier/naivebayes/test
        • /mahout/trunk/core/src/main/java/org/apache/mahout/classifier/naivebayes/test/BayesTestMapper.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/classifier/naivebayes/test/TestNaiveBayesDriver.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/classifier/naivebayes/training/IndexInstancesMapper.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/classifier/naivebayes/training/ThetaMapper.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/classifier/naivebayes/training/TrainNaiveBayesJob.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/classifier/naivebayes/training/TrainUtils.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/common/AbstractJob.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/common/HadoopUtil.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/common/iterator/sequencefile/SequenceFileDirIterator.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/driver/MahoutDriver.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/qr/QRFirstStep.java
        • /mahout/trunk/core/src/test/java/org/apache/mahout/classifier/naivebayes/NaiveBayesTest.java
        • /mahout/trunk/distribution
        • /mahout/trunk/examples
        • /mahout/trunk/examples/bin/build-asf-email.sh
        • /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/example/email
        • /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/example/email/EmailUtility.java
        • /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/example/email/FromEmailToDictionaryMapper.java
        • /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/example/email/MailToDictionaryReducer.java
        • /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/example/email/MailToPrefsDriver.java
        • /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/example/email/MailToRecMapper.java
        • /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/example/email/MsgIdToDictionaryMapper.java
        • /mahout/trunk/examples/src/main/java/org/apache/mahout/classifier/bayes/SplitBayesInput.java
        • /mahout/trunk/examples/src/main/java/org/apache/mahout/classifier/email
        • /mahout/trunk/examples/src/main/java/org/apache/mahout/classifier/email/PrepEmailMapper.java
        • /mahout/trunk/examples/src/main/java/org/apache/mahout/classifier/email/PrepEmailReducer.java
        • /mahout/trunk/examples/src/main/java/org/apache/mahout/classifier/email/PrepEmailVectorsDriver.java
        • /mahout/trunk/examples/src/test/java/org/apache/mahout/cf
        • /mahout/trunk/examples/src/test/java/org/apache/mahout/cf/taste
        • /mahout/trunk/examples/src/test/java/org/apache/mahout/cf/taste/example
        • /mahout/trunk/examples/src/test/java/org/apache/mahout/cf/taste/example/email
        • /mahout/trunk/examples/src/test/java/org/apache/mahout/cf/taste/example/email/MailToPrefsTest.java
        • /mahout/trunk/examples/src/test/java/org/apache/mahout/classifier/bayes/SplitBayesInputTest.java
        • /mahout/trunk/integration/bin/prep_asf_mail_archives.sh
        • /mahout/trunk/integration/src/main/java/org/apache/mahout/text/ChunkedWriter.java
        • /mahout/trunk/integration/src/main/java/org/apache/mahout/text/PrefixAdditionFilter.java
        • /mahout/trunk/integration/src/main/java/org/apache/mahout/text/SequenceFilesFromDirectory.java
        • /mahout/trunk/integration/src/main/java/org/apache/mahout/text/SequenceFilesFromDirectoryFilter.java
        • /mahout/trunk/integration/src/main/java/org/apache/mahout/text/SequenceFilesFromMailArchives.java
        • /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/SequenceFileDumper.java
        • /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/SplitInput.java
        • /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/clustering/ClusterDumper.java
        • /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/clustering/GraphMLClusterWriter.java
        • /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/email
        • /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/email/MailOptions.java
        • /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/email/MailProcessor.java
        • /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/io
        • /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/io/ChunkedWrapper.java
        • /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/io/ChunkedWriter.java
        • /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/io/IOWriterWrapper.java
        • /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/io/WrappedWriter.java
        • /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/vectors/io/AbstractClusterWriter.java
        • /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/vectors/io/CSVClusterWriter.java
        • /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/vectors/io/ClusterDumperWriter.java
        • /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/vectors/io/ClusterWriter.java
        • /mahout/trunk/integration/src/test/java/org/apache/mahout/text/SequenceFilesFromMailArchivesTest.java
        • /mahout/trunk/integration/src/test/java/org/apache/mahout/utils/email
        • /mahout/trunk/integration/src/test/java/org/apache/mahout/utils/email/MailProcessorTest.java
        • /mahout/trunk/integration/src/test/resources
        • /mahout/trunk/integration/src/test/resources/test.mbox
        • /mahout/trunk/src/conf/driver.classes.props
        Show
        Hudson added a comment - Integrated in Mahout-Quality #1085 (See https://builds.apache.org/job/Mahout-Quality/1085/ ) MAHOUT-798 : add in examples for working with ASF email archive, plus various refactorings to clusterdumper, etc. for viewing results gsingers : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1180043 Files : /mahout/trunk/bin/mahout /mahout/trunk/buildtools /mahout/trunk/core /mahout/trunk/core/src/main/java/org/apache/mahout/classifier/naivebayes/AbstractNaiveBayesClassifier.java /mahout/trunk/core/src/main/java/org/apache/mahout/classifier/naivebayes/BayesUtils.java /mahout/trunk/core/src/main/java/org/apache/mahout/classifier/naivebayes/test /mahout/trunk/core/src/main/java/org/apache/mahout/classifier/naivebayes/test/BayesTestMapper.java /mahout/trunk/core/src/main/java/org/apache/mahout/classifier/naivebayes/test/TestNaiveBayesDriver.java /mahout/trunk/core/src/main/java/org/apache/mahout/classifier/naivebayes/training/IndexInstancesMapper.java /mahout/trunk/core/src/main/java/org/apache/mahout/classifier/naivebayes/training/ThetaMapper.java /mahout/trunk/core/src/main/java/org/apache/mahout/classifier/naivebayes/training/TrainNaiveBayesJob.java /mahout/trunk/core/src/main/java/org/apache/mahout/classifier/naivebayes/training/TrainUtils.java /mahout/trunk/core/src/main/java/org/apache/mahout/common/AbstractJob.java /mahout/trunk/core/src/main/java/org/apache/mahout/common/HadoopUtil.java /mahout/trunk/core/src/main/java/org/apache/mahout/common/iterator/sequencefile/SequenceFileDirIterator.java /mahout/trunk/core/src/main/java/org/apache/mahout/driver/MahoutDriver.java /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/qr/QRFirstStep.java /mahout/trunk/core/src/test/java/org/apache/mahout/classifier/naivebayes/NaiveBayesTest.java /mahout/trunk/distribution /mahout/trunk/examples /mahout/trunk/examples/bin/build-asf-email.sh /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/example/email /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/example/email/EmailUtility.java /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/example/email/FromEmailToDictionaryMapper.java /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/example/email/MailToDictionaryReducer.java /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/example/email/MailToPrefsDriver.java /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/example/email/MailToRecMapper.java /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/example/email/MsgIdToDictionaryMapper.java /mahout/trunk/examples/src/main/java/org/apache/mahout/classifier/bayes/SplitBayesInput.java /mahout/trunk/examples/src/main/java/org/apache/mahout/classifier/email /mahout/trunk/examples/src/main/java/org/apache/mahout/classifier/email/PrepEmailMapper.java /mahout/trunk/examples/src/main/java/org/apache/mahout/classifier/email/PrepEmailReducer.java /mahout/trunk/examples/src/main/java/org/apache/mahout/classifier/email/PrepEmailVectorsDriver.java /mahout/trunk/examples/src/test/java/org/apache/mahout/cf /mahout/trunk/examples/src/test/java/org/apache/mahout/cf/taste /mahout/trunk/examples/src/test/java/org/apache/mahout/cf/taste/example /mahout/trunk/examples/src/test/java/org/apache/mahout/cf/taste/example/email /mahout/trunk/examples/src/test/java/org/apache/mahout/cf/taste/example/email/MailToPrefsTest.java /mahout/trunk/examples/src/test/java/org/apache/mahout/classifier/bayes/SplitBayesInputTest.java /mahout/trunk/integration/bin/prep_asf_mail_archives.sh /mahout/trunk/integration/src/main/java/org/apache/mahout/text/ChunkedWriter.java /mahout/trunk/integration/src/main/java/org/apache/mahout/text/PrefixAdditionFilter.java /mahout/trunk/integration/src/main/java/org/apache/mahout/text/SequenceFilesFromDirectory.java /mahout/trunk/integration/src/main/java/org/apache/mahout/text/SequenceFilesFromDirectoryFilter.java /mahout/trunk/integration/src/main/java/org/apache/mahout/text/SequenceFilesFromMailArchives.java /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/SequenceFileDumper.java /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/SplitInput.java /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/clustering/ClusterDumper.java /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/clustering/GraphMLClusterWriter.java /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/email /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/email/MailOptions.java /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/email/MailProcessor.java /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/io /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/io/ChunkedWrapper.java /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/io/ChunkedWriter.java /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/io/IOWriterWrapper.java /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/io/WrappedWriter.java /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/vectors/io/AbstractClusterWriter.java /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/vectors/io/CSVClusterWriter.java /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/vectors/io/ClusterDumperWriter.java /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/vectors/io/ClusterWriter.java /mahout/trunk/integration/src/test/java/org/apache/mahout/text/SequenceFilesFromMailArchivesTest.java /mahout/trunk/integration/src/test/java/org/apache/mahout/utils/email /mahout/trunk/integration/src/test/java/org/apache/mahout/utils/email/MailProcessorTest.java /mahout/trunk/integration/src/test/resources /mahout/trunk/integration/src/test/resources/test.mbox /mahout/trunk/src/conf/driver.classes.props
        Hide
        Hudson added a comment -

        Integrated in Mahout-Quality #1091 (See https://builds.apache.org/job/Mahout-Quality/1091/)
        MAHOUT-798: restrict the number of items per label to avoid overtraining

        gsingers : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1181061
        Files :

        • /mahout/trunk/examples/bin/build-asf-email.sh
        Show
        Hudson added a comment - Integrated in Mahout-Quality #1091 (See https://builds.apache.org/job/Mahout-Quality/1091/ ) MAHOUT-798 : restrict the number of items per label to avoid overtraining gsingers : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1181061 Files : /mahout/trunk/examples/bin/build-asf-email.sh
        Hide
        Hudson added a comment -

        Integrated in Mahout-Quality #1094 (See https://builds.apache.org/job/Mahout-Quality/1094/)
        MAHOUT-798: fix recommender content extraction from email

        gsingers : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1183379
        Files :

        • /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/example/email/EmailUtility.java
        • /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/example/email/MailToPrefsDriver.java
        • /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/example/email/MailToRecMapper.java
        • /mahout/trunk/integration/src/main/java/org/apache/mahout/text/SequenceFilesFromMailArchives.java
        • /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/email/MailOptions.java
        Show
        Hudson added a comment - Integrated in Mahout-Quality #1094 (See https://builds.apache.org/job/Mahout-Quality/1094/ ) MAHOUT-798 : fix recommender content extraction from email gsingers : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1183379 Files : /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/example/email/EmailUtility.java /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/example/email/MailToPrefsDriver.java /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/example/email/MailToRecMapper.java /mahout/trunk/integration/src/main/java/org/apache/mahout/text/SequenceFilesFromMailArchives.java /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/email/MailOptions.java
        Hide
        Hudson added a comment -

        Integrated in Mahout-Quality #1193 (See https://builds.apache.org/job/Mahout-Quality/1193/)
        MAHOUT-798: minor bug fixes with recommendation example to remove dups and properly handle missing dictionary hits

        gsingers : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1205271
        Files :

        • /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/example/email/MailToPrefsDriver.java
        • /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/example/email/MailToRecMapper.java
        • /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/example/email/MailToRecReducer.java
        • /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/email/MailProcessor.java
        Show
        Hudson added a comment - Integrated in Mahout-Quality #1193 (See https://builds.apache.org/job/Mahout-Quality/1193/ ) MAHOUT-798 : minor bug fixes with recommendation example to remove dups and properly handle missing dictionary hits gsingers : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1205271 Files : /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/example/email/MailToPrefsDriver.java /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/example/email/MailToRecMapper.java /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/example/email/MailToRecReducer.java /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/email/MailProcessor.java
        Hide
        Hudson added a comment -

        Integrated in Mahout-Quality #1199 (See https://builds.apache.org/job/Mahout-Quality/1199/)
        MAHOUT-798: fix some edge cases around handling ids

        gsingers : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1206335
        Files :

        • /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/example/email/EmailUtility.java
        • /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/example/email/FromEmailToDictionaryMapper.java
        • /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/example/email/MsgIdToDictionaryMapper.java
        • /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/email/MailProcessor.java
        Show
        Hudson added a comment - Integrated in Mahout-Quality #1199 (See https://builds.apache.org/job/Mahout-Quality/1199/ ) MAHOUT-798 : fix some edge cases around handling ids gsingers : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1206335 Files : /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/example/email/EmailUtility.java /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/example/email/FromEmailToDictionaryMapper.java /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/example/email/MsgIdToDictionaryMapper.java /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/email/MailProcessor.java

          People

          • Assignee:
            Grant Ingersoll
            Reporter:
            Grant Ingersoll
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development