Uploaded image for project: 'Apache Celeborn'
  1. Apache Celeborn
  2. CELEBORN-1749

The applicationId label metrics lost

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.6.0

    Description

      Issue 2:
      I am working on the Celeborn Alert SOP for how to figure out the top app usages.
      For this expr, I can not find the applicationId that wroten 140TB shuffle data.

      ```
      topk(5, sum by (applicationId) (metrics_diskBytesWritten_Value{role="worker", applicationId=~"^application_.*"}))
      ```
      <img width="1491" alt="image" src="https://github.com/user-attachments/assets/c7caa5d1-c99c-4062-8c78-e7bd8ed5c3db">

      But with this expr, the shuffle size match.
      ```
      topk(5, sum by (name) (metrics_diskBytesWritten_Value{role="worker", applicationId=""}))
      ```
      <img width="1490" alt="image" src="https://github.com/user-attachments/assets/da7f53c5-cc75-4856-97f8-fb12ec80addc">

      Please note that, the celeborn cluster has not take traffic, and only one testing application was running at that time.

      <img width="937" alt="image" src="https://github.com/user-attachments/assets/8d2a1833-b982-487f-988a-b1b4db1764bd">

      Per the metrics, it seems that some metrics `metrics_diskBytesWritten_Value`  with applicationId label were lost.

      Attachments

        Issue Links

          Activity

            People

              feiwang Fei Wang
              feiwang Fei Wang
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m