Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-2586

Unstable Storm Compatibility Tests

    XMLWordPrintableJSON

Details

    Description

      The Storm Compatibility tests frequently fail.

      The reason is that they kill the topologies after a certain time interval. That may fail on CI infrastructure when certain steps are delayed beyond usual. Trying to guarantee progress by time is inherently problematic:

      • Waiting too short makes tests unstable
      • Waiting too long makes tests slow

      The right way to go is letting the program decide when to terminate, for example by throwing a special SuccessException.

      Have a look at the Kafka connector tests, they do this a lot and hence run exactly as short or as long as they need to.

      Here is an example of a failed run: https://s3.amazonaws.com/archive.travis-ci.org/jobs/77499577/log.txt

      From FLINK-2801

      The tests for the storm compatibiliy layer are all working with timeouts (running the program for 10 seconds) and then checking whether teh expected result has been written.

      That is inherently unstable and slow (long delays). They should be rewritten in a similar manner like for example the KafkaITCase tests, where the streaming jobs terminate themselves with a "SuccessException", which can be recognized as successful completion when thrown by the job client.

      Attachments

        Issue Links

          Activity

            People

              mjsax Matthias J. Sax
              sewen Stephan Ewen
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: