Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-31125

Flink ML benchmark framework should minimize the source operator overhead

    XMLWordPrintableJSON

Details

    Description

      Flink ML benchmark framework estimates the throughput by having a source operator generate a given number (e.g. 10^7) of input records with random values, let the given AlgoOperator process these input records, and divide the number of records by the total execution time.

      The overhead of generating random values for all input records has observable impact on the estimated throughput. We would like to minimize the overhead of the source operator so that the benchmark result can focus on the throughput of the AlgoOperator as much as possible.

      Note that spark-sql-perf generates all input records in advance into memory before running the benchmark. This allows Spark ML benchmark to read records from memory instead of generating values for those records during the benchmark.

      We can generate value once and re-use it for all input records. This approach minimizes the source operator head and allows us to compare Flink ML benchmark result with Spark ML benchmark result (from spark-sql-perf) fairly.

      Attachments

        Issue Links

          Activity

            People

              lindong Dong Lin
              lindong Dong Lin
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: