Details
-
Umbrella
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
This umbrella summarizes all issues related with checkpointing and task restarting to archieve fault tolerance on the job level.
Attachments
Issue Links
- relates to
-
HAMA-504 Cluster High Availability
- Open
1.
|
Add documentation to fault tolerant job processing | Open | Unassigned | |
2.
|
Handle counters during task recovery | Open | Suraj Menon | |
3.
|
Recover tasks on failure of groom server | Open | Suraj Menon | |
4.
|
Confined recovery | Open | Unassigned |