[FLINK-2586] Unstable Storm Compatibility Tests - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: 0.10.0
Fix Version/s: 1.0.0
Component/s: Legacy Components / Storm Compatibility
Labels:
- test-stability

Description

The Storm Compatibility tests frequently fail.

The reason is that they kill the topologies after a certain time interval. That may fail on CI infrastructure when certain steps are delayed beyond usual. Trying to guarantee progress by time is inherently problematic:

Waiting too short makes tests unstable
Waiting too long makes tests slow

The right way to go is letting the program decide when to terminate, for example by throwing a special SuccessException.

Have a look at the Kafka connector tests, they do this a lot and hence run exactly as short or as long as they need to.

Here is an example of a failed run: https://s3.amazonaws.com/archive.travis-ci.org/jobs/77499577/log.txt

From ~~FLINK-2801~~

The tests for the storm compatibiliy layer are all working with timeouts (running the program for 10 seconds) and then checking whether teh expected result has been written.

That is inherently unstable and slow (long delays). They should be rewritten in a similar manner like for example the KafkaITCase tests, where the streaming jobs terminate themselves with a "SuccessException", which can be recognized as successful completion when thrown by the job client.

Attachments

Issue Links

is duplicated by

FLINK-2801 Rework Storm Compatibility Tests

Closed

FLINK-2847 Fix flaky test in StormTestBase.testJob

Closed

Activity

People

Assignee:: Matthias J. Sax

Reporter:: Stephan Ewen

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 27/Aug/15 16:53

Updated:: 19/Jan/16 14:45

Resolved:: 15/Jan/16 08:57