[SAMZA-863] Support multi-threading in samza tasks - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 0.11.0
Fix Version/s: 0.11.0
Component/s: None
Labels:
None

Description

Currently a samza container executes the tasks sequentially in a single thread. For example, we have message 1 and 2 in the pending queue for task 1 and task 2. Task 1 will process message 1, and until its completion task 2 can process message 2. If we want to handle more messages in parallel, we have to increase the container count, e.g. from 1 to 2 in the example.

While this solution has been working for many CPU-bound job scenarios, we do see its drawback for IO-bound jobs.In this kind of jobs, the task makes IO/Network requests, i.e, db calls, rest calls or external service RPC calls. These IO calls significantly slow down the task processing. We can increase container number in order to parallelize the IO calls, but it results in low CPU utilization. If we can improve CPU utilization by allocating multiple contains in the same CPU core, it will still cause dramatic memory growth due to the memory being allocated for each container.

To better scale the performance of IO-bound jobs, we are proposing to support multi-threaded processing in samza. The design proposal will come soon.

rbs:
https://reviews.apache.org/r/48243/: ~~SAMZA-961~~: Async tasks and multithreading model

https://reviews.apache.org/r/48213/: ~~SAMZA-960~~: Make system producer thread safe

https://reviews.apache.org/r/48182/: ~~SAMZA-958~~: Make store/cache thread safe

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

DESIGN-SAMZA-863-0.pdf
12/Feb/16 18:23
367 kB
Xinyu Liu
DESIGN-SAMZA-863-1.pdf
01/Mar/16 01:59
405 kB
Xinyu Liu
DESIGN-SAMZA-863-2.pdf
09/Mar/16 00:39
417 kB
Xinyu Liu
DESIGN-SAMZA-863-3.pdf
09/Mar/16 00:44
417 kB
Xinyu Liu
perf-test-results.pdf
21/Jun/16 23:47
144 kB
Xinyu Liu
SAMZA-863.0.patch
18/Jul/16 18:21
189 kB
Xinyu Liu
SAMZA-863.1.patch
19/Jul/16 18:40
189 kB
Xinyu Liu

Sub-Tasks

1.	Make store/cache thread safe	Resolved	Xinyu Liu
2.	Make system producer thread safe	Resolved	Xinyu Liu
3.	Async tasks and multithreading model	Resolved	Xinyu Liu
4.	User doc for samza multithreading	Resolved	Xinyu Liu

Activity

People

Assignee:: Xinyu Liu

Reporter:: Xinyu Liu

Votes:: 1 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 26/Jan/16 20:15

Updated:: 20/Dec/17 21:37

Resolved:: 20/Jul/16 17:00