[BEAM-2888] Runner Comparison / Capability Matrix revamp - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: In Progress
Priority: P3
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: website
Labels:
- full-time
- gsoc2022
- gsod
- gsod2019
- gsod2022
- mentor

Description

The goal for this project has changed: We now want to create a completely new Capability Matrix that is based on the ValidatesRunner tests that we run on the various Apache Beam runners.

We can use the test in ./test-infra/validates-runner/ to generate a JSON file that contains the capabilities supported by various runners and tested by each individual test.

----------------------------------------------------

Discussion: https://lists.apache.org/thread.html/8aff7d70c254356f2dae3109fb605e0b60763602225a877d3dadf8b7@%3Cdev.beam.apache.org%3E

Summarizing that discussion, we have a lot of issues/wishes. Some can be addressed as one-off and some need a unified reorganization of the runner comparison.

Basic corrections:

Remove rows that impossible to not support (ParDo)
Remove rows where "support" doesn't really make sense (Composite transforms)
Deduplicate rows are actually the same model feature (all non-merging windowing / all merging windowing)
Clearly separate rows that represent optimizations (Combine)
Correct rows in the wrong place (Timers are actually a "what...?" row)
Separate or remove rows have not been designed ([Meta]Data driven triggers, retractions)
Rename rows with names that appear no where else (Timestamp control, which is called a TimestampCombiner in Java)
Switch to a more distinct color scheme for full/partial support (currently just solid/faded colors)
Switch to something clearer than "~" for partial support, versus ✘ and ✓ for none and full.
Correct Gearpump support for merging windows (see ~~BEAM-2759~~)
Correct Spark support for non-merging and merging windows (see BEAM-2499)

Minor rewrites:

Lump all the basic stuff (ParDo, GroupByKey, Read, Window) into one row
Make sections as users see them, like "ParDo" / "side Inputs" not "What?" / "side inputs"
Add rows for non-model things, like portability framework support, metrics backends, etc

Bigger rewrites:

Add versioning to the comparison, as in BEAM-166
Find a way to fit in a plain English summary of runner's support in Beam. It should come first, as it is what new users need before getting to details.
Find a way to describe production readiness of runners and/or testimonials of who is using it in production.
Have a place to compare non-model differences between runners

Changes requiring engineering efforts:

Gather and add quantitative runner metrics, perhaps Nexmark results for mid-level, smaller benchmarks for measuring aspects of specific features, and larger end-to-end benchmarks to get an idea how it might actually perform on a use case
Tighter coupling of the matrix portion of the comparison with tags on ValidatesRunner tests

If you care to address some aspect of this, please reach out and/or just file a subtask and address it.

Attachments

Issue Links

is related to

BEAM-2944 Update Beam capability matrix using Nexmark

Open

links to

GitHub Pull Request #8576

GitHub Pull Request #13492

GitHub Pull Request #14545

Sub-Tasks

Design for runner comparison with narrative intro for each runner

Open

Unassigned

Activity

People

Assignee:: Unassigned

Reporter:: Kenneth Knowles

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 11/Sep/17 17:49

Updated:: 13/Apr/23 11:17

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

19h

Include sub-tasks