Mahout
  1. Mahout
  2. MAHOUT-798

Add Examples for the ASF Mail Archive

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.6
    • Component/s: None
    • Labels:
      None

      Description

      Per http://www.lucidimagination.com/search/document/c6ea889edb9ad0fe/email_and_collab_filtering, I am working on a variety of examples based on the ASF email archive. WIP will be at https://github.com/lucidimagination/mahout.

      I intend to have at least three examples, one for classification, clustering and collab filtering.

        Activity

        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Resolved Resolved
        38d 17h 16m 1 Grant Ingersoll 07/Oct/11 15:13
        Resolved Resolved Closed Closed
        124d 23h 47m 1 Sean Owen 09/Feb/12 14:01
        Sean Owen made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Hide
        Hudson added a comment -

        Integrated in Mahout-Quality #1199 (See https://builds.apache.org/job/Mahout-Quality/1199/)
        MAHOUT-798: fix some edge cases around handling ids

        gsingers : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1206335
        Files :

        • /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/example/email/EmailUtility.java
        • /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/example/email/FromEmailToDictionaryMapper.java
        • /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/example/email/MsgIdToDictionaryMapper.java
        • /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/email/MailProcessor.java
        Show
        Hudson added a comment - Integrated in Mahout-Quality #1199 (See https://builds.apache.org/job/Mahout-Quality/1199/ ) MAHOUT-798 : fix some edge cases around handling ids gsingers : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1206335 Files : /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/example/email/EmailUtility.java /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/example/email/FromEmailToDictionaryMapper.java /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/example/email/MsgIdToDictionaryMapper.java /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/email/MailProcessor.java
        Hide
        Hudson added a comment -

        Integrated in Mahout-Quality #1193 (See https://builds.apache.org/job/Mahout-Quality/1193/)
        MAHOUT-798: minor bug fixes with recommendation example to remove dups and properly handle missing dictionary hits

        gsingers : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1205271
        Files :

        • /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/example/email/MailToPrefsDriver.java
        • /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/example/email/MailToRecMapper.java
        • /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/example/email/MailToRecReducer.java
        • /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/email/MailProcessor.java
        Show
        Hudson added a comment - Integrated in Mahout-Quality #1193 (See https://builds.apache.org/job/Mahout-Quality/1193/ ) MAHOUT-798 : minor bug fixes with recommendation example to remove dups and properly handle missing dictionary hits gsingers : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1205271 Files : /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/example/email/MailToPrefsDriver.java /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/example/email/MailToRecMapper.java /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/example/email/MailToRecReducer.java /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/email/MailProcessor.java
        Hide
        Hudson added a comment -

        Integrated in Mahout-Quality #1094 (See https://builds.apache.org/job/Mahout-Quality/1094/)
        MAHOUT-798: fix recommender content extraction from email

        gsingers : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1183379
        Files :

        • /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/example/email/EmailUtility.java
        • /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/example/email/MailToPrefsDriver.java
        • /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/example/email/MailToRecMapper.java
        • /mahout/trunk/integration/src/main/java/org/apache/mahout/text/SequenceFilesFromMailArchives.java
        • /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/email/MailOptions.java
        Show
        Hudson added a comment - Integrated in Mahout-Quality #1094 (See https://builds.apache.org/job/Mahout-Quality/1094/ ) MAHOUT-798 : fix recommender content extraction from email gsingers : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1183379 Files : /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/example/email/EmailUtility.java /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/example/email/MailToPrefsDriver.java /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/example/email/MailToRecMapper.java /mahout/trunk/integration/src/main/java/org/apache/mahout/text/SequenceFilesFromMailArchives.java /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/email/MailOptions.java
        Hide
        Hudson added a comment -

        Integrated in Mahout-Quality #1091 (See https://builds.apache.org/job/Mahout-Quality/1091/)
        MAHOUT-798: restrict the number of items per label to avoid overtraining

        gsingers : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1181061
        Files :

        • /mahout/trunk/examples/bin/build-asf-email.sh
        Show
        Hudson added a comment - Integrated in Mahout-Quality #1091 (See https://builds.apache.org/job/Mahout-Quality/1091/ ) MAHOUT-798 : restrict the number of items per label to avoid overtraining gsingers : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1181061 Files : /mahout/trunk/examples/bin/build-asf-email.sh
        Hide
        Hudson added a comment -

        Integrated in Mahout-Quality #1085 (See https://builds.apache.org/job/Mahout-Quality/1085/)
        MAHOUT-798: add in examples for working with ASF email archive, plus various refactorings to clusterdumper, etc. for viewing results

        gsingers : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1180043
        Files :

        • /mahout/trunk/bin/mahout
        • /mahout/trunk/buildtools
        • /mahout/trunk/core
        • /mahout/trunk/core/src/main/java/org/apache/mahout/classifier/naivebayes/AbstractNaiveBayesClassifier.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/classifier/naivebayes/BayesUtils.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/classifier/naivebayes/test
        • /mahout/trunk/core/src/main/java/org/apache/mahout/classifier/naivebayes/test/BayesTestMapper.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/classifier/naivebayes/test/TestNaiveBayesDriver.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/classifier/naivebayes/training/IndexInstancesMapper.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/classifier/naivebayes/training/ThetaMapper.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/classifier/naivebayes/training/TrainNaiveBayesJob.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/classifier/naivebayes/training/TrainUtils.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/common/AbstractJob.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/common/HadoopUtil.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/common/iterator/sequencefile/SequenceFileDirIterator.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/driver/MahoutDriver.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/qr/QRFirstStep.java
        • /mahout/trunk/core/src/test/java/org/apache/mahout/classifier/naivebayes/NaiveBayesTest.java
        • /mahout/trunk/distribution
        • /mahout/trunk/examples
        • /mahout/trunk/examples/bin/build-asf-email.sh
        • /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/example/email
        • /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/example/email/EmailUtility.java
        • /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/example/email/FromEmailToDictionaryMapper.java
        • /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/example/email/MailToDictionaryReducer.java
        • /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/example/email/MailToPrefsDriver.java
        • /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/example/email/MailToRecMapper.java
        • /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/example/email/MsgIdToDictionaryMapper.java
        • /mahout/trunk/examples/src/main/java/org/apache/mahout/classifier/bayes/SplitBayesInput.java
        • /mahout/trunk/examples/src/main/java/org/apache/mahout/classifier/email
        • /mahout/trunk/examples/src/main/java/org/apache/mahout/classifier/email/PrepEmailMapper.java
        • /mahout/trunk/examples/src/main/java/org/apache/mahout/classifier/email/PrepEmailReducer.java
        • /mahout/trunk/examples/src/main/java/org/apache/mahout/classifier/email/PrepEmailVectorsDriver.java
        • /mahout/trunk/examples/src/test/java/org/apache/mahout/cf
        • /mahout/trunk/examples/src/test/java/org/apache/mahout/cf/taste
        • /mahout/trunk/examples/src/test/java/org/apache/mahout/cf/taste/example
        • /mahout/trunk/examples/src/test/java/org/apache/mahout/cf/taste/example/email
        • /mahout/trunk/examples/src/test/java/org/apache/mahout/cf/taste/example/email/MailToPrefsTest.java
        • /mahout/trunk/examples/src/test/java/org/apache/mahout/classifier/bayes/SplitBayesInputTest.java
        • /mahout/trunk/integration/bin/prep_asf_mail_archives.sh
        • /mahout/trunk/integration/src/main/java/org/apache/mahout/text/ChunkedWriter.java
        • /mahout/trunk/integration/src/main/java/org/apache/mahout/text/PrefixAdditionFilter.java
        • /mahout/trunk/integration/src/main/java/org/apache/mahout/text/SequenceFilesFromDirectory.java
        • /mahout/trunk/integration/src/main/java/org/apache/mahout/text/SequenceFilesFromDirectoryFilter.java
        • /mahout/trunk/integration/src/main/java/org/apache/mahout/text/SequenceFilesFromMailArchives.java
        • /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/SequenceFileDumper.java
        • /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/SplitInput.java
        • /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/clustering/ClusterDumper.java
        • /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/clustering/GraphMLClusterWriter.java
        • /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/email
        • /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/email/MailOptions.java
        • /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/email/MailProcessor.java
        • /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/io
        • /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/io/ChunkedWrapper.java
        • /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/io/ChunkedWriter.java
        • /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/io/IOWriterWrapper.java
        • /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/io/WrappedWriter.java
        • /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/vectors/io/AbstractClusterWriter.java
        • /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/vectors/io/CSVClusterWriter.java
        • /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/vectors/io/ClusterDumperWriter.java
        • /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/vectors/io/ClusterWriter.java
        • /mahout/trunk/integration/src/test/java/org/apache/mahout/text/SequenceFilesFromMailArchivesTest.java
        • /mahout/trunk/integration/src/test/java/org/apache/mahout/utils/email
        • /mahout/trunk/integration/src/test/java/org/apache/mahout/utils/email/MailProcessorTest.java
        • /mahout/trunk/integration/src/test/resources
        • /mahout/trunk/integration/src/test/resources/test.mbox
        • /mahout/trunk/src/conf/driver.classes.props
        Show
        Hudson added a comment - Integrated in Mahout-Quality #1085 (See https://builds.apache.org/job/Mahout-Quality/1085/ ) MAHOUT-798 : add in examples for working with ASF email archive, plus various refactorings to clusterdumper, etc. for viewing results gsingers : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1180043 Files : /mahout/trunk/bin/mahout /mahout/trunk/buildtools /mahout/trunk/core /mahout/trunk/core/src/main/java/org/apache/mahout/classifier/naivebayes/AbstractNaiveBayesClassifier.java /mahout/trunk/core/src/main/java/org/apache/mahout/classifier/naivebayes/BayesUtils.java /mahout/trunk/core/src/main/java/org/apache/mahout/classifier/naivebayes/test /mahout/trunk/core/src/main/java/org/apache/mahout/classifier/naivebayes/test/BayesTestMapper.java /mahout/trunk/core/src/main/java/org/apache/mahout/classifier/naivebayes/test/TestNaiveBayesDriver.java /mahout/trunk/core/src/main/java/org/apache/mahout/classifier/naivebayes/training/IndexInstancesMapper.java /mahout/trunk/core/src/main/java/org/apache/mahout/classifier/naivebayes/training/ThetaMapper.java /mahout/trunk/core/src/main/java/org/apache/mahout/classifier/naivebayes/training/TrainNaiveBayesJob.java /mahout/trunk/core/src/main/java/org/apache/mahout/classifier/naivebayes/training/TrainUtils.java /mahout/trunk/core/src/main/java/org/apache/mahout/common/AbstractJob.java /mahout/trunk/core/src/main/java/org/apache/mahout/common/HadoopUtil.java /mahout/trunk/core/src/main/java/org/apache/mahout/common/iterator/sequencefile/SequenceFileDirIterator.java /mahout/trunk/core/src/main/java/org/apache/mahout/driver/MahoutDriver.java /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/qr/QRFirstStep.java /mahout/trunk/core/src/test/java/org/apache/mahout/classifier/naivebayes/NaiveBayesTest.java /mahout/trunk/distribution /mahout/trunk/examples /mahout/trunk/examples/bin/build-asf-email.sh /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/example/email /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/example/email/EmailUtility.java /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/example/email/FromEmailToDictionaryMapper.java /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/example/email/MailToDictionaryReducer.java /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/example/email/MailToPrefsDriver.java /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/example/email/MailToRecMapper.java /mahout/trunk/examples/src/main/java/org/apache/mahout/cf/taste/example/email/MsgIdToDictionaryMapper.java /mahout/trunk/examples/src/main/java/org/apache/mahout/classifier/bayes/SplitBayesInput.java /mahout/trunk/examples/src/main/java/org/apache/mahout/classifier/email /mahout/trunk/examples/src/main/java/org/apache/mahout/classifier/email/PrepEmailMapper.java /mahout/trunk/examples/src/main/java/org/apache/mahout/classifier/email/PrepEmailReducer.java /mahout/trunk/examples/src/main/java/org/apache/mahout/classifier/email/PrepEmailVectorsDriver.java /mahout/trunk/examples/src/test/java/org/apache/mahout/cf /mahout/trunk/examples/src/test/java/org/apache/mahout/cf/taste /mahout/trunk/examples/src/test/java/org/apache/mahout/cf/taste/example /mahout/trunk/examples/src/test/java/org/apache/mahout/cf/taste/example/email /mahout/trunk/examples/src/test/java/org/apache/mahout/cf/taste/example/email/MailToPrefsTest.java /mahout/trunk/examples/src/test/java/org/apache/mahout/classifier/bayes/SplitBayesInputTest.java /mahout/trunk/integration/bin/prep_asf_mail_archives.sh /mahout/trunk/integration/src/main/java/org/apache/mahout/text/ChunkedWriter.java /mahout/trunk/integration/src/main/java/org/apache/mahout/text/PrefixAdditionFilter.java /mahout/trunk/integration/src/main/java/org/apache/mahout/text/SequenceFilesFromDirectory.java /mahout/trunk/integration/src/main/java/org/apache/mahout/text/SequenceFilesFromDirectoryFilter.java /mahout/trunk/integration/src/main/java/org/apache/mahout/text/SequenceFilesFromMailArchives.java /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/SequenceFileDumper.java /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/SplitInput.java /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/clustering/ClusterDumper.java /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/clustering/GraphMLClusterWriter.java /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/email /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/email/MailOptions.java /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/email/MailProcessor.java /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/io /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/io/ChunkedWrapper.java /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/io/ChunkedWriter.java /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/io/IOWriterWrapper.java /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/io/WrappedWriter.java /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/vectors/io/AbstractClusterWriter.java /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/vectors/io/CSVClusterWriter.java /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/vectors/io/ClusterDumperWriter.java /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/vectors/io/ClusterWriter.java /mahout/trunk/integration/src/test/java/org/apache/mahout/text/SequenceFilesFromMailArchivesTest.java /mahout/trunk/integration/src/test/java/org/apache/mahout/utils/email /mahout/trunk/integration/src/test/java/org/apache/mahout/utils/email/MailProcessorTest.java /mahout/trunk/integration/src/test/resources /mahout/trunk/integration/src/test/resources/test.mbox /mahout/trunk/src/conf/driver.classes.props
        Grant Ingersoll made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Fix Version/s 0.6 [ 12316364 ]
        Resolution Fixed [ 1 ]
        Hide
        Grant Ingersoll added a comment -

        Added in examples of clustering, classification and recommendation using the ASF data set. Also added the ability to dump out clusters to files in various formats, as well as a pluggable Writer approach for doing that. Made various other refactorings.

        Show
        Grant Ingersoll added a comment - Added in examples of clustering, classification and recommendation using the ASF data set. Also added the ability to dump out clusters to files in various formats, as well as a pluggable Writer approach for doing that. Made various other refactorings.
        Grant Ingersoll made changes -
        Field Original Value New Value
        Assignee Grant Ingersoll [ gsingers ]
        Grant Ingersoll created issue -

          People

          • Assignee:
            Grant Ingersoll
            Reporter:
            Grant Ingersoll
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development