Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-5923

Test instability in SavepointITCase testTriggerSavepointAndResume

    Details

    • Type: Test
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.3.0
    • Component/s: Tests
    • Labels:

      Description

      https://s3.amazonaws.com/archive.travis-ci.org/jobs/205042538/log.txt

      Failed tests: 
        SavepointITCase.testTriggerSavepointAndResume:258 Checkpoints directory not cleaned up: [/tmp/junit1029044621247843839/junit7338507921051602138/checkpoints/47fa12635d098bdafd52def453e6d66c/chk-4] expected:<0> but was:<1>
      

      I think this is due to a race in the test. When shutting down the cluster it can happen that in progress checkpoints linger around.

        Issue Links

          Activity

          Hide
          NicoK Nico Kruber added a comment -

          I guess the following error may also originate from that race condition

          https://s3.amazonaws.com/archive.travis-ci.org/jobs/205888798/log.txt

          testTriggerSavepointAndResume(org.apache.flink.test.checkpointing.SavepointITCase)  Time elapsed: 1.581 sec  <<< ERROR!
          java.io.IOException: Unable to delete file: /tmp/junit1592062472104041767/junit8429426931866360142/checkpoints/5ec09c5215b989bd25752be56ca02a46/chk-5/15b909b5-f375-45e5-8737-10935d77c9a4
          	at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:2279)
          	at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1653)
          	at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1535)
          	at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:2270)
          	at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1653)
          	at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1535)
          	at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:2270)
          	at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1653)
          	at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1535)
          	at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:2270)
          	at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1653)
          	at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1535)
          	at org.apache.flink.test.checkpointing.SavepointITCase.testTriggerSavepointAndResume(SavepointITCase.java:411)
          
          Show
          NicoK Nico Kruber added a comment - I guess the following error may also originate from that race condition https://s3.amazonaws.com/archive.travis-ci.org/jobs/205888798/log.txt testTriggerSavepointAndResume(org.apache.flink.test.checkpointing.SavepointITCase) Time elapsed: 1.581 sec <<< ERROR! java.io.IOException: Unable to delete file: /tmp/junit1592062472104041767/junit8429426931866360142/checkpoints/5ec09c5215b989bd25752be56ca02a46/chk-5/15b909b5-f375-45e5-8737-10935d77c9a4 at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:2279) at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1653) at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1535) at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:2270) at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1653) at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1535) at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:2270) at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1653) at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1535) at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:2270) at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1653) at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1535) at org.apache.flink.test.checkpointing.SavepointITCase.testTriggerSavepointAndResume(SavepointITCase.java:411)
          Show
          aljoscha Aljoscha Krettek added a comment - Another instance: https://s3.amazonaws.com/archive.travis-ci.org/jobs/206139042/log.txt
          Show
          githubbot ASF GitHub Bot added a comment - Github user uce commented on the issue: https://github.com/apache/flink/pull/3427 Three of my local Travis runs passed: https://travis-ci.org/uce/flink/builds/206125226 https://travis-ci.org/uce/flink/builds/206125321 https://travis-ci.org/uce/flink/builds/206125360 In progress: https://travis-ci.org/uce/flink/builds/206125404 I think we are good to go to merge this.
          Hide
          StephanEwen Stephan Ewen added a comment -

          Fixed via c24c7ec3332d0eb6ebb24eb70c9aabd055cc129f

          Note: Wrong JIRA tag in commit message!

          Show
          StephanEwen Stephan Ewen added a comment - Fixed via c24c7ec3332d0eb6ebb24eb70c9aabd055cc129f Note: Wrong JIRA tag in commit message!
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/flink/pull/3427

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/3427

            People

            • Assignee:
              uce Ufuk Celebi
              Reporter:
              uce Ufuk Celebi
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development