Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-11727

[C++][FlightRPC] Use TDigest to estimate latency quantiles in benchmark

    XMLWordPrintableJSON

    Details

      Description

      In Flight benchmark, boost accumulator is used to estimate latency quantiles (0.5, 0.95, 0.99). Internally, boost adopts P-Square algorithm [1]. P-Square is very bad at estimating skewed quantiles like 0.99, where TDigest shines.

      Test result shows 0.99 latency is much better than what current code tells us. We should switch to TDigest.

      • run flight-benchmark with default parameters
      • calculate 0.99 quantile of latencies
      • compare exact value (store all data points), value from tdigest, and value from boost
      • test 5 rounds
        Exact Tdigest Boost-P2
        86    93      2130
        175   235     1526
        151   165     1926
        147   153     302
        251   313     561
        

      TDigest gives more accurate values for all quantiles. For 0.5 quantiles, both TDigest and Boost gives very accurate result. For 0.95 quantiles, TDigest gives almost exact value, Boost has a bit deviation.

      [1] https://www.cse.wustl.edu/~jain/papers/ftp/psqr.pdf

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                yibo Yibo Cai
                Reporter:
                yibo Yibo Cai
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 0.5h
                  0.5h