Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-2888

Runner Comparison / Capability Matrix revamp



    • Type: Improvement
    • Status: Open
    • Priority: P3
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: website
    • Labels:


      Discussion: https://lists.apache.org/thread.html/8aff7d70c254356f2dae3109fb605e0b60763602225a877d3dadf8b7@%3Cdev.beam.apache.org%3E

      Summarizing that discussion, we have a lot of issues/wishes. Some can be addressed as one-off and some need a unified reorganization of the runner comparison.

      Basic corrections:

      • Remove rows that impossible to not support (ParDo)
      • Remove rows where "support" doesn't really make sense (Composite transforms)
      • Deduplicate rows are actually the same model feature (all non-merging windowing / all merging windowing)
      • Clearly separate rows that represent optimizations (Combine)
      • Correct rows in the wrong place (Timers are actually a "what...?" row)
      • Separate or remove rows have not been designed ([Meta]Data driven triggers, retractions)
      • Rename rows with names that appear no where else (Timestamp control, which is called a TimestampCombiner in Java)
      • Switch to a more distinct color scheme for full/partial support (currently just solid/faded colors)
      • Switch to something clearer than "~" for partial support, versus ✘ and ✓ for none and full.
      • Correct Gearpump support for merging windows (see BEAM-2759)
      • Correct Spark support for non-merging and merging windows (see BEAM-2499)

      Minor rewrites:

      • Lump all the basic stuff (ParDo, GroupByKey, Read, Window) into one row
      • Make sections as users see them, like "ParDo" / "side Inputs" not "What?" / "side inputs"
      • Add rows for non-model things, like portability framework support, metrics backends, etc

      Bigger rewrites:

      • Add versioning to the comparison, as in BEAM-166
      • Find a way to fit in a plain English summary of runner's support in Beam. It should come first, as it is what new users need before getting to details.
      • Find a way to describe production readiness of runners and/or testimonials of who is using it in production.
      • Have a place to compare non-model differences between runners

      Changes requiring engineering efforts:

      • Gather and add quantitative runner metrics, perhaps Nexmark results for mid-level, smaller benchmarks for measuring aspects of specific features, and larger end-to-end benchmarks to get an idea how it might actually perform on a use case
      • Tighter coupling of the matrix portion of the comparison with tags on ValidatesRunner tests

      If you care to address some aspect of this, please reach out and/or just file a subtask and address it.


          Issue Links



              • Assignee:
                kenn Kenneth Knowles
              • Votes:
                0 Vote for this issue
                5 Start watching this issue


                • Created:

                  Time Tracking

                  Original Estimate - Not Specified
                  Not Specified
                  Remaining Estimate - 0h
                  Time Spent - 2h 20m
                  2h 20m