Uploaded image for project: 'James Server'
  1. James Server
  2. JAMES-1216

[gsoc2011] Design and implement machine learning filters and categorization for mail

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None

    Description

      Context: Anti-spam functionality based on SpamAssassin is available at James (base on mailets http://james.apache.org/mailet). Bayesian mailets are also available, but not completely integrated/documented. Nothing is available to automatically categorize mail traffic per user.

      Task: We are willing to align the existing implementation with any modern anti-spam solution based on powerfull machine learning implementation (such as apache mahout). We are also willing to extend the machine learning usage to some mail categorization (spam vs not-spam is a first category, we can extend it to any additional category we can imagine). The implementation can partially occur while spooling the mails and/or when mail is stored in mailbox.

      Related discussions: See also discussions on mail intelligent mining on http://markmail.org/message/2bodrwvdvtfq3f2v (mahout related) and http://markmail.org/thread/pksl6csyvoeo27yh (hama related).

      Mentor: eric at apache dot org & [fill in mentor]

      Complexity: high

      Attachments

        Activity

          People

            eric@apache.org Eric Charles
            eric@apache.org Eric Charles
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: