[BEAM-9440] Performance Issues with Beam Runners compared with Native Systems - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: P3
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: runner-apex, runner-flink, runner-spark
Labels:
None

Description

While doing a performance evaluation of Apache Beam with Spark Runner - I found that even for a simple word count problem on a text file – Beam with Spark runner was slower by a factor of 5 times as compared to Spark for a dataset as small as 14 GB.

You will find more details on this evaluation here - https://github.com/soumabrata-chakraborty/spark-vs-beam/blob/master/README.md

I also came across this analysis called **Quantitative Impact Evaluation of an Abstraction Layer for Data Stream Processing Systems (https://arxiv.org/pdf/1907.08302.pdf / https://ieeexplore.ieee.org/document/8884832)

According to it, the observation was that for most scenarios the slowdown was at least a factor of 3 with the worse case being a factor of 58!

While it is understood that an abstraction layer would come with some performance cost - the current performance cost seems to be very high.

Attachments

Issue Links

is related to

BEAM-2274 beam on spark runner run much slower than using spark

Resolved

supercedes

BEAM-2274 beam on spark runner run much slower than using spark

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Soumabrata Chakraborty

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 04/Mar/20 19:34

Updated:: 04/Jun/22 14:41