Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-23266

HA per-job cluster (rocks, non-incremental) hangs on Azure

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.12.4
    • Fix Version/s: 1.12.6
    • Component/s: None
    • Labels:

      Description

      https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=19943&view=logs&j=6caf31d6-847a-526e-9624-468e053467d6&t=0b23652f-b18b-5b6e-6eb6-a11070364610&l=1858

      Jul 05 21:56:00 ==============================================================================
      Jul 05 21:56:00 Running 'Running HA per-job cluster (rocks, non-incremental) end-to-end test'
      Jul 05 21:56:00 ==============================================================================
      Jul 05 21:56:00 TEST_DATA_DIR: /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-00772599944
      Jul 05 21:56:00 Flink dist directory: /home/vsts/work/1/s/flink-dist/target/flink-1.12-SNAPSHOT-bin/flink-1.12-SNAPSHOT
      Jul 05 21:56:00 Flink dist directory: /home/vsts/work/1/s/flink-dist/target/flink-1.12-SNAPSHOT-bin/flink-1.12-SNAPSHOT
      Jul 05 21:56:01 Starting zookeeper daemon on host fv-az43-4.
      Jul 05 21:56:01 Running on HA mode: parallelism=4, backend=rocks, asyncSnapshots=true, incremSnapshots=false and zk=3.4.
      Jul 05 21:56:03 Starting standalonejob daemon on host fv-az43-4.
      Jul 05 21:56:03 Start 1 more task managers
      Jul 05 21:56:04 Starting taskexecutor daemon on host fv-az43-4.
      Jul 05 21:56:10 Job (00000000000000000000000000000000) is not yet running.
      Jul 05 21:56:18 Job (00000000000000000000000000000000) is running.
      Jul 05 21:56:18 Running JM watchdog @ 266158
      Jul 05 21:56:18 Running TM watchdog @ 266159
      Jul 05 21:56:18 Waiting for text Completed checkpoint [1-9]* for job 00000000000000000000000000000000 to appear 2 of times in logs...
      Jul 05 21:56:22 Killed JM @ 264313
      Jul 05 21:56:22 Waiting for text Completed checkpoint [1-9]* for job 00000000000000000000000000000000 to appear 2 of times in logs...
      grep: /home/vsts/work/1/s/flink-dist/target/flink-1.12-SNAPSHOT-bin/flink-1.12-SNAPSHOT/log/*standalonejob-1*.log: No such file or directory
      grep: /home/vsts/work/1/s/flink-dist/target/flink-1.12-SNAPSHOT-bin/flink-1.12-SNAPSHOT/log/*standalonejob-1*.log: No such file or directory
      grep: /home/vsts/work/1/s/flink-dist/target/flink-1.12-SNAPSHOT-bin/flink-1.12-SNAPSHOT/log/*standalonejob-1*.log: No such file or directory
      grep: /home/vsts/work/1/s/flink-dist/target/flink-1.12-SNAPSHOT-bin/flink-1.12-SNAPSHOT/log/*standalonejob-1*.log: No such file or directory
      Jul 05 21:56:26 Killed TM @ 264571
      grep: /home/vsts/work/1/s/flink-dist/target/flink-1.12-SNAPSHOT-bin/flink-1.12-SNAPSHOT/log/*standalonejob-1*.log: No such file or directory
      Jul 05 21:56:26 Starting standalonejob daemon on host fv-az43-4.
      grep: /home/vsts/work/1/s/flink-dist/target/flink-1.12-SNAPSHOT-bin/flink-1.12-SNAPSHOT/log/*standalonejob-1*.log: No such file or directory
      Jul 05 21:57:12 Killed JM @ 267798
      Jul 05 21:57:12 Waiting for text Completed checkpoint [1-9]* for job 00000000000000000000000000000000 to appear 2 of times in logs...
      grep: /home/vsts/work/1/s/flink-dist/target/flink-1.12-SNAPSHOT-bin/flink-1.12-SNAPSHOT/log/*standalonejob-2*.log: No such file or directory
      grep: /home/vsts/work/1/s/flink-dist/target/flink-1.12-SNAPSHOT-bin/flink-1.12-SNAPSHOT/log/*standalonejob-2*.log: No such file or directory
      grep: /home/vsts/work/1/s/flink-dist/target/flink-1.12-SNAPSHOT-bin/flink-1.12-SNAPSHOT/log/*standalonejob-2*.log: No such file or directory
      grep: /home/vsts/work/1/s/flink-dist/target/flink-1.12-SNAPSHOT-bin/flink-1.12-SNAPSHOT/log/*standalonejob-2*.log: No such file or directory
      Jul 05 21:57:15 Starting standalonejob daemon on host fv-az43-4.
      grep: /home/vsts/work/1/s/flink-dist/target/flink-1.12-SNAPSHOT-bin/flink-1.12-SNAPSHOT/log/*standalonejob-2*.log: No such file or directory
      /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/common_ha.sh: line 151: [: 58)\n\tat org.apache.flink.runtime.rest.handler.job.AbstractExecutionGraphHandler.lambda$handleRequest$0(AbstractExecutionGraphHandler.java: integer expression expected
      Jul 05 21:58:07 Killed JM @ 271440
      Jul 05 21:58:07 Waiting for text Completed checkpoint [1-9]* for job 00000000000000000000000000000000 to appear 2 of times in logs...
      grep: /home/vsts/work/1/s/flink-dist/target/flink-1.12-SNAPSHOT-bin/flink-1.12-SNAPSHOT/log/*standalonejob-3*.log: No such file or directory
      grep: /home/vsts/work/1/s/flink-dist/target/flink-1.12-SNAPSHOT-bin/flink-1.12-SNAPSHOT/log/*standalonejob-3*.log: No such file or directory
      Jul 05 21:58:09 Killed TM @ 267660
      Jul 05 21:58:09 Starting standalonejob daemon on host fv-az43-4.
      grep: /home/vsts/work/1/s/flink-dist/target/flink-1.12-SNAPSHOT-bin/flink-1.12-SNAPSHOT/log/*standalonejob-3*.log: No such file or directory
      grep: /home/vsts/work/1/s/flink-dist/target/flink-1.12-SNAPSHOT-bin/flink-1.12-SNAPSHOT/log/*standalonejob-3*.log: No such file or directory
      kill: usage: kill [-s sigspec | -n signum | -sigspec] pid | jobspec ... or kill -l [sigspec]
      Jul 05 21:58:51 Killed TM @ 
      Jul 05 22:11:00 Test (pid: 263840) did not finish after 900 seconds.
      

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              xtsong Xintong Song
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: