Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-10226

Latency metrics can choke job-manager

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.5.0
    • None
    • Runtime / Metrics
    • None

    Description

      With Flink 1.5.0 my Apache Beam job was not runnable unless I turned off latencyTracking feature. That job generated huge amount of latency metrics + histogram aggregates which updating occupied job-manager too much and cluster did fall appart.

      This was discussed on mailing list:

      http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Flink-cluster-crashing-going-from-1-4-0-gt-1-5-3-td23941.html

      The purpose of the ticket is reason about how to improve this and on which end. I am currently not sure what is the root cause:
      a) Beam-To-Flink translation does generate too much of of "noise operators"
      b) Flink does not handle latencyTracking well for large jobs 

       

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              JozoVilcek Jozef Vilcek
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: