Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-35258

Enhance ESS ExternalBlockHandler with additional block rate-based metrics and histograms

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.1.1
    • 3.2.0
    • Shuffle, YARN
    • None

    Description

      Today the ExternalBlockHandler component of ESS exposes some useful metrics, but is lacking around metrics for the rate of block transfers. We have blockTransferRateBytes to tell us the rate of bytes, but no metric to tell us the rate of blocks, which is especially relevant when running the ESS on HDDs that are sensitive to random reads. Many small block transfers can have a negative impact on performance, but won't show up as a spike in blockTransferRateBytes since the sizes are small.

      We can also enhance YarnShuffleServiceMetrics to expose histogram-style metrics from the Timer instances within ExternalBlockHandler – today it is only exposing the count and rate, but not timing information from the Snapshot.

      These two changes can make it easier to monitor the health of the ESS.

      Attachments

        Issue Links

          Activity

            People

              xkrogen Erik Krogen
              xkrogen Erik Krogen
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: