Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-12150

Flink on mesos failed to created TaskManager while cache directory is gone

    XMLWordPrintableJSON

Details

    Description

      Fetcher tries to fetch files from local cache directory in order to bring up a new TaskManager. When the local cache directory was removed, it got stuck, showing below error message.

       

      I0409 09:21:04.930480 3959 fetcher.cpp:560] Fetcher Info: {"cache_directory":"/tmp/mesos/fetch/root","items":[\{"action":"RETRIEVE_FROM_CACHE","cache_filename":"c876-pyflink.sh","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink/bin/pyflink.sh","value":"http://10.167.0.177:10007/a52b4395-f1d3-47d0-88ea-521cd2c9cc09/flink/bin/pyflink.sh"}},\{"action":"RETRIEVE_FROM_CACHE","cache_filename":"c875-config.sh","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink/bin/config.sh","value":"http://10.167.0.177:10007/a52b4395-f1d3-47d0-88ea-521cd2c9cc09/flink/bin/config.sh"}},\{"action":"RETRIEVE_FROM_CACHE","cache_filename":"c872-flink-console.sh","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink/bin/flink-console.sh","value":"http://10.167.0.177:10007/a52b4395-f1d3-47d0-88ea-521cd2c9cc09/flink/bin/flink-console.sh"}},\{"action":"RETRIEVE_FROM_CACHE","cache_filename":"c859-pyflink.bat","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink/bin/pyflink.bat","value":"http://10.167.0.177:10007/a52b4395-f1d3-47d0-88ea-521cd2c9cc09/flink/bin/pyflink.bat"}},\{"action":"RETRIEVE_FROM_CACHE","cache_filename":"c881-logback-console.xml","uri":{"cache":true,"executable":false,"extract":false,"output_file":"flink/conf/logback-console.xml","value":"http://10.167.0.177:10007/a52b4395-f1d3-47d0-88ea-521cd2c9cc09/flink/conf/logback-console.xml"}},\{"action":"RETRIEVE_FROM_CACHE","cache_filename":"c871-flink-daemon.sh","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink/bin/flink-daemon.sh","value":"http://10.167.0.177:10007/a52b4395-f1d3-47d0-88ea-521cd2c9cc09/flink/bin/flink-daemon.sh"}},\{"action":"RETRIEVE_FROM_CACHE","cache_filename":"c862-standalone-job.sh","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink/bin/standalone-job.sh","value":"http://10.167.0.177:10007/a52b4395-f1d3-47d0-88ea-521cd2c9cc09/flink/bin/standalone-job.sh"}},\{"action":"RETRIEVE_FROM_CACHE","cache_filename":"c877-flink.bat","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink/bin/flink.bat","value":"http://10.167.0.177:10007/a52b4395-f1d3-47d0-88ea-521cd2c9cc09/flink/bin/flink.bat"}},\{"action":"RETRIEVE_FROM_CACHE","cache_filename":"c879-log4j-cons_properties","uri":{"cache":true,"executable":false,"extract":false,"output_file":"flink/conf/log4j-console.properties","value":"http://10.167.0.177:10007/a52b4395-f1d3-47d0-88ea-521cd2c9cc09/flink/conf/log4j-console.properties"}},\{"action":"RETRIEVE_FROM_CACHE","cache_filename":"c870-sql-client.sh","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink/bin/sql-client.sh","value":"http://10.167.0.177:10007/a52b4395-f1d3-47d0-88ea-521cd2c9cc09/flink/bin/sql-client.sh"}},\{"action":"RETRIEVE_FROM_CACHE","cache_filename":"c873-historyserver.sh","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink/bin/historyserver.sh","value":"http://10.167.0.177:10007/a52b4395-f1d3-47d0-88ea-521cd2c9cc09/flink/bin/historyserver.sh"}},\{"action":"RETRIEVE_FROM_CACHE","cache_filename":"c865-flink","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink/bin/flink","value":"http://10.167.0.177:10007/a52b4395-f1d3-47d0-88ea-521cd2c9cc09/flink/bin/flink"}},\{"action":"RETRIEVE_FROM_CACHE","cache_filename":"c892-log4j.properties","uri":{"cache":true,"executable":false,"extract":false,"output_file":"flink/conf/log4j.properties","value":"http://10.167.0.177:10007/a52b4395-f1d3-47d0-88ea-521cd2c9cc09/flink/conf/log4j.properties"}},\{"action":"RETRIEVE_FROM_CACHE","cache_filename":"c869-mesos-appmaster.sh","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink/bin/mesos-appmaster.sh","value":"http://10.167.0.177:10007/a52b4395-f1d3-47d0-88ea-521cd2c9cc09/flink/bin/mesos-appmaster.sh"}},\{"action":"RETRIEVE_FROM_CACHE","cache_filename":"c868-start-scala-shell.sh","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink/bin/start-scala-shell.sh","value":"http://10.167.0.177:10007/a52b4395-f1d3-47d0-88ea-521cd2c9cc09/flink/bin/start-scala-shell.sh"}},\{"action":"RETRIEVE_FROM_CACHE","cache_filename":"c866-yarn-session.sh","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink/bin/yarn-session.sh","value":"http://10.167.0.177:10007/a52b4395-f1d3-47d0-88ea-521cd2c9cc09/flink/bin/yarn-session.sh"}},\{"action":"RETRIEVE_FROM_CACHE","cache_filename":"c856-jobmanager.sh","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink/bin/jobmanager.sh","value":"http://10.167.0.177:10007/a52b4395-f1d3-47d0-88ea-521cd2c9cc09/flink/bin/jobmanager.sh"}},\{"action":"RETRIEVE_FROM_CACHE","cache_filename":"c863-pyflink-stream.sh","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink/bin/pyflink-stream.sh","value":"http://10.167.0.177:10007/a52b4395-f1d3-47d0-88ea-521cd2c9cc09/flink/bin/pyflink-stream.sh"}},\{"action":"RETRIEVE_FROM_CACHE","cache_filename":"c895-flink-dist_-1.6.2.jar","uri":{"cache":true,"executable":false,"extract":false,"output_file":"flink/lib/flink-dist_2.11-1.6.2.jar","value":"http://10.167.0.177:10007/a52b4395-f1d3-47d0-88ea-521cd2c9cc09/flink/lib/flink-dist_2.11-1.6.2.jar"}},\{"action":"RETRIEVE_FROM_CACHE","cache_filename":"c857-zookeeper.sh","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink/bin/zookeeper.sh","value":"http://10.167.0.177:10007/a52b4395-f1d3-47d0-88ea-521cd2c9cc09/flink/bin/zookeeper.sh"}},\{"action":"RETRIEVE_FROM_CACHE","cache_filename":"c890-logback.xml","uri":{"cache":true,"executable":false,"extract":false,"output_file":"flink/conf/logback.xml","value":"http://10.167.0.177:10007/a52b4395-f1d3-47d0-88ea-521cd2c9cc09/flink/conf/logback.xml"}},\{"action":"RETRIEVE_FROM_CACHE","cache_filename":"c858-mesos-appm_ter-job.sh","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink/bin/mesos-appmaster-job.sh","value":"http://10.167.0.177:10007/a52b4395-f1d3-47d0-88ea-521cd2c9cc09/flink/bin/mesos-appmaster-job.sh"}},\{"action":"RETRIEVE_FROM_CACHE","cache_filename":"c884-zoo.cfg","uri":{"cache":true,"executable":false,"extract":false,"output_file":"flink/conf/zoo.cfg","value":"http://10.167.0.177:10007/a52b4395-f1d3-47d0-88ea-521cd2c9cc09/flink/conf/zoo.cfg"}},\{"action":"RETRIEVE_FROM_CACHE","cache_filename":"c883-log4j-yarn_properties","uri":{"cache":true,"executable":false,"extract":false,"output_file":"flink/conf/log4j-yarn-session.properties","value":"http://10.167.0.177:10007/a52b4395-f1d3-47d0-88ea-521cd2c9cc09/flink/conf/log4j-yarn-session.properties"}},\{"action":"RETRIEVE_FROM_CACHE","cache_filename":"c880-flink-conf.yaml","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink/conf/flink-conf.yaml","value":"http://10.167.0.177:10007/a52b4395-f1d3-47d0-88ea-521cd2c9cc09/flink/conf/flink-conf.yaml"}},\{"action":"RETRIEVE_FROM_CACHE","cache_filename":"c878-start-cluster.bat","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink/bin/start-cluster.bat","value":"http://10.167.0.177:10007/a52b4395-f1d3-47d0-88ea-521cd2c9cc09/flink/bin/start-cluster.bat"}},\{"action":"RETRIEVE_FROM_CACHE","cache_filename":"c888-log4j-cli.properties","uri":{"cache":true,"executable":false,"extract":false,"output_file":"flink/conf/log4j-cli.properties","value":"http://10.167.0.177:10007/a52b4395-f1d3-47d0-88ea-521cd2c9cc09/flink/conf/log4j-cli.properties"}},\{"action":"RETRIEVE_FROM_CACHE","cache_filename":"c897-flink-pyth_-1.6.2.jar","uri":{"cache":true,"executable":false,"extract":false,"output_file":"flink/lib/flink-python_2.11-1.6.2.jar","value":"http://10.167.0.177:10007/a52b4395-f1d3-47d0-88ea-521cd2c9cc09/flink/lib/flink-python_2.11-1.6.2.jar"}},\{"action":"RETRIEVE_FROM_CACHE","cache_filename":"c864-start-cluster.sh","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink/bin/start-cluster.sh","value":"http://10.167.0.177:10007/a52b4395-f1d3-47d0-88ea-521cd2c9cc09/flink/bin/start-cluster.sh"}},\{"action":"RETRIEVE_FROM_CACHE","cache_filename":"c861-start-zook_-quorum.sh","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink/bin/start-zookeeper-quorum.sh","value":"http://10.167.0.177:10007/a52b4395-f1d3-47d0-88ea-521cd2c9cc09/flink/bin/start-zookeeper-quorum.sh"}},\{"action":"RETRIEVE_FROM_CACHE","cache_filename":"c886-sql-client_aults.yaml","uri":{"cache":true,"executable":false,"extract":false,"output_file":"flink/conf/sql-client-defaults.yaml","value":"http://10.167.0.177:10007/a52b4395-f1d3-47d0-88ea-521cd2c9cc09/flink/conf/sql-client-defaults.yaml"}},\{"action":"RETRIEVE_FROM_CACHE","cache_filename":"c874-mesos-taskmanager.sh","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink/bin/mesos-taskmanager.sh","value":"http://10.167.0.177:10007/a52b4395-f1d3-47d0-88ea-521cd2c9cc09/flink/bin/mesos-taskmanager.sh"}},\{"action":"RETRIEVE_FROM_CACHE","cache_filename":"c885-masters","uri":{"cache":true,"executable":false,"extract":false,"output_file":"flink/conf/masters","value":"http://10.167.0.177:10007/a52b4395-f1d3-47d0-88ea-521cd2c9cc09/flink/conf/masters"}},\{"action":"RETRIEVE_FROM_CACHE","cache_filename":"c889-flink-conf_jackhe.bak","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink/conf/flink-conf.yaml_jackhe.bak","value":"http://10.167.0.177:10007/a52b4395-f1d3-47d0-88ea-521cd2c9cc09/flink/conf/flink-conf.yaml_jackhe.bak"}},\{"action":"RETRIEVE_FROM_CACHE","cache_filename":"c893-log4j.propertiese","uri":{"cache":true,"executable":false,"extract":false,"output_file":"flink/conf/log4j.propertiese","value":"http://10.167.0.177:10007/a52b4395-f1d3-47d0-88ea-521cd2c9cc09/flink/conf/log4j.propertiese"}},\{"action":"RETRIEVE_FROM_CACHE","cache_filename":"c887-logback-yarn.xml","uri":{"cache":true,"executable":false,"extract":false,"output_file":"flink/conf/logback-yarn.xml","value":"http://10.167.0.177:10007/a52b4395-f1d3-47d0-88ea-521cd2c9cc09/flink/conf/logback-yarn.xml"}},\{"action":"RETRIEVE_FROM_CACHE","cache_filename":"c894-flink-shad_-1.6.2.jar","uri":{"cache":true,"executable":false,"extract":false,"output_file":"flink/lib/flink-shaded-hadoop2-uber-1.6.2.jar","value":"http://10.167.0.177:10007/a52b4395-f1d3-47d0-88ea-521cd2c9cc09/flink/lib/flink-shaded-hadoop2-uber-1.6.2.jar"}},\{"action":"RETRIEVE_FROM_CACHE","cache_filename":"c898-slf4j-log4_-1.7.7.jar","uri":{"cache":true,"executable":false,"extract":false,"output_file":"flink/lib/slf4j-log4j12-1.7.7.jar","value":"http://10.167.0.177:10007/a52b4395-f1d3-47d0-88ea-521cd2c9cc09/flink/lib/slf4j-log4j12-1.7.7.jar"}},\{"action":"RETRIEVE_FROM_CACHE","cache_filename":"c896-log4j-1.2.17.jar","uri":{"cache":true,"executable":false,"extract":false,"output_file":"flink/lib/log4j-1.2.17.jar","value":"http://10.167.0.177:10007/a52b4395-f1d3-47d0-88ea-521cd2c9cc09/flink/lib/log4j-1.2.17.jar"}},\{"action":"RETRIEVE_FROM_CACHE","cache_filename":"c882-flink-conf_aml_jackhe","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink/conf/flink-conf.yaml_jackhe","value":"http://10.167.0.177:10007/a52b4395-f1d3-47d0-88ea-521cd2c9cc09/flink/conf/flink-conf.yaml_jackhe"}},\{"action":"RETRIEVE_FROM_CACHE","cache_filename":"c860-stop-zooke_-quorum.sh","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink/bin/stop-zookeeper-quorum.sh","value":"http://10.167.0.177:10007/a52b4395-f1d3-47d0-88ea-521cd2c9cc09/flink/bin/stop-zookeeper-quorum.sh"}},\{"action":"RETRIEVE_FROM_CACHE","cache_filename":"c891-slaves","uri":{"cache":true,"executable":false,"extract":false,"output_file":"flink/conf/slaves","value":"http://10.167.0.177:10007/a52b4395-f1d3-47d0-88ea-521cd2c9cc09/flink/conf/slaves"}},\{"action":"RETRIEVE_FROM_CACHE","cache_filename":"c855-stop-cluster.sh","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink/bin/stop-cluster.sh","value":"http://10.167.0.177:10007/a52b4395-f1d3-47d0-88ea-521cd2c9cc09/flink/bin/stop-cluster.sh"}},\{"action":"RETRIEVE_FROM_CACHE","cache_filename":"c867-taskmanager.sh","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink/bin/taskmanager.sh","value":"http://10.167.0.177:10007/a52b4395-f1d3-47d0-88ea-521cd2c9cc09/flink/bin/taskmanager.sh"}}],"sandbox_directory":"/var/lib/mesos/slave/slaves/f998b1af-1b1c-41a5-a4d3-25c8b30a40c1-S48/frameworks/f998b1af-1b1c-41a5-a4d3-25c8b30a40c1-0070/executors/taskmanager-44035/runs/816380d0-c3d6-4c4a-ab82-ea73dcacfcaf","stall_timeout":

      {"nanoseconds":60000000000}

      ,"user":"root"}
      I0409 09:21:04.942873 3959 fetcher.cpp:457] Fetching URI 'http://10.167.0.177:10007/a52b4395-f1d3-47d0-88ea-521cd2c9cc09/flink/bin/pyflink.sh'
      I0409 09:21:04.942909 3959 fetcher.cpp:350] Fetching from cache
      cp: cannot stat ‘/tmp/mesos/fetch/root/c876-pyflink.sh’: No such file or directory
      E0409 09:21:04.945482 3959 fetcher.cpp:613] EXIT with status 1: Failed to fetch 'http://10.167.0.177:10007/a52b4395-f1d3-47d0-88ea-521cd2c9cc09/flink/bin/pyflink.sh': cp failed with status: 256
      Failed to synchronize with agent (it's probably exited)
      ed)

      Attachments

        1. stderr (10)
          13 kB
          JackHe

        Activity

          People

            Unassigned Unassigned
            jackhe90 JackHe
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: