[MAILBOX-44] [gsoc2011] Design and implement a distributed mailbox using Hadoop - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.4
Component/s: None
Labels:
- gsoc2011

Description

Context: The mailbox subproject (http://james.apache.org/mailbox/) supports maildir, SQL database (via JPA) and Java Content Repository (JCR) as technology for mail storage. This flexibility is achieved thanks to a API design that abstracts mail storage from the mail protocols.

Task: We need to implement mailbox storage as a distributed system on top of Hadoop HDFS. The James mailbox API will be used. A first step is to design how to interact with Hadoop (native api, gora incubator at apache,...) and deal with specific performance questions related to mail loading/parsing in a distributed system (use map/reduce or not, use existing local lucene indexes for search,...). The second step is to implement the HDFS mailbox (maildir mailbox is similar because is stores mails as a file and can be an inspiration). A single James server will still be deployed because we don't have any distributed UID generation.

Mentor: eric at apache dot org

Complexity: medium

Attachments

Activity

People

Assignee:: Norman Maurer

Reporter:: Eric Charles

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 30/Mar/11 06:22

Updated:: 04/Sep/11 16:04

Resolved:: 04/Sep/11 13:10