[TEPHRA-39] Support Nested Transactions - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: api, core, manager
Labels:
- phoenix

Description

Tephra transactions are used not only for real-time processing, but also in batch.

There are cases when during processing data in batches, the single unit of processing (a job) consists of many subunits (tasks), which have its own lifecycle. Such subunits can fail, can run in parallel and attempted multiple times to recover from failure or for performance optimization reasons. One example of such batch data processing is MapReduce: the job may consist of multiple Map and Reduce tasks that can be re-attempted individually after failure or attempted multiple times during speculative execution.

At the moment Tephra supports "long-running" transactions that can be used to cover a batch job. Using it guarantees that no incomplete changes of a job that is not finished properly (failed, killed, cancelled) are visible to others.

At the same time this does not guarantee that changes made by multiple attempts of the same subtask cannot break the data consistency, unless data operations in subtasks are idempotent because same data operations may be done multiple times by different subtask attempts.

Having nested transactions would allow relaxing this constraint by:

running each individual subtask in separate "nested transaction"
make changes of subtasks made in nested transactions visible only if the parent transaction succeeds

Attachments

Issue Links

relates to

TEPHRA-96 Transaction Checkpoints - multiple write pointers per tx

Resolved

links to

CDAP-457 Remove idempotency requirement for data ops in MapReduce job

Activity

People

Assignee:: Gary Helmling

Reporter:: Alex Baranau

Votes:: 1 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 21/Oct/14 01:10

Updated:: 14/Jul/16 05:13