[HAMA-636] Confined recovery - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: bsp core, messaging
Labels:
- gsoc2014
- java

Description

"Confined recovery" mentioned in Pregel paper can be used to improve the cost and latency of recovery.

In addition to the existing HDFS checkpoints,1) the tasks log outgoing messages to local filesystem for each superstep (See disk queue). When a task fails, 2) it reverts to the last checkpoint. 3) Other tasks re-send messages sent to failed task at each superstep occurring after the last checkpoint.

Today we write the checkpointed messages to HDFS. We want these files to be written on local filesystem. There should be a way these files could be moved across to optimize the fault recovery process.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Edward J. Yoon

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 10/Sep/12 04:12

Updated:: 13/Feb/14 10:16