Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-25480

Create dashboard/monitoring to see resource usage per E2E test

    XMLWordPrintableJSON

Details

    Description

      Over the past couple of weeks, we've encountered multiple problems with tests failing due to out-of-memory errors and/or exit code 137 happening. These are happening both on Alibaba CI machines, as well as Azure hosted agents. For the Alibaba CI machines, we've mitigated the problem by reducing the number of workers per CI machine from 7 to 5. These workers can spin up multiple Docker containers, especially with Testcontainers getting used more and more.

      If we can get insights in the resource usage per end-to-end test, it will also help in debugging test infrastructure problems more quickly.

      Attachments

        Activity

          People

            Unassigned Unassigned
            martijnvisser Martijn Visser
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: