Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-23941

Mesos task failed on specific spark app name

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.2.1, 2.3.0
    • Fix Version/s: 2.2.2, 2.3.1, 2.4.0
    • Component/s: Mesos, Spark Submit
    • Labels:
      None
    • Environment:

      OS: Ubuntu 16.0.4

      Spark: 2.3.0

      Mesos: 1.5.0

      Description

      It seems to be a bug related to spark's MesosClusterDispatcher. In order to reproduce the bug, you need to have mesos and mesos dispatcher running.

      I'm currently running mesos 1.5 and spark 2.3.0 (tried with 2.2.1 as well).

      If you launch the following program:

       

      spark-submit --master mesos://127.0.1.1:7077 --deploy-mode cluster --class org.apache.spark.examples.SparkPi --name "my favorite task (myId = 123-456)" /home/tiboun/tools/spark/examples/jars/spark-examples_2.11-2.3.0.jar 100
      

      , then the task fails with the following output :

       

      I0409 11:00:35.360352 22726 fetcher.cpp:551] Fetcher Info: {"cache_directory":"\/tmp\/mesos\/fetch\/tiboun","items":[{"action":"BYPASS_CACHE","uri":{"cache":false,"extract":true,"value":"\/home\/tiboun\/tools\/spark\/examples\/jars\/spark-examples_2.11-2.3.0.jar"}}],"sandbox_directory":"\/var\/lib\/mesos\/slaves\/0262246c-14a3-4408-9b74-5e3b65dc1344-S0\/frameworks\/edff1a6f-38c6-46e0-a3c1-62a8fbfc2b5d-0014\/executors\/driver-20180409110035-0004\/runs\/8ac20902-74e1-45c4-9ab6-c52a79940189","user":"tiboun"}
      I0409 11:00:35.363119 22726 fetcher.cpp:450] Fetching URI '/home/tiboun/tools/spark/examples/jars/spark-examples_2.11-2.3.0.jar'
      I0409 11:00:35.363143 22726 fetcher.cpp:291] Fetching directly into the sandbox directory
      I0409 11:00:35.363168 22726 fetcher.cpp:225] Fetching URI '/home/tiboun/tools/spark/examples/jars/spark-examples_2.11-2.3.0.jar'
      W0409 11:00:35.366839 22726 fetcher.cpp:330] Copying instead of extracting resource from URI with 'extract' flag, because it does not seem to be an archive: /home/tiboun/tools/spark/examples/jars/spark-examples_2.11-2.3.0.jar
      I0409 11:00:35.366873 22726 fetcher.cpp:603] Fetched '/home/tiboun/tools/spark/examples/jars/spark-examples_2.11-2.3.0.jar' to '/var/lib/mesos/slaves/0262246c-14a3-4408-9b74-5e3b65dc1344-S0/frameworks/edff1a6f-38c6-46e0-a3c1-62a8fbfc2b5d-0014/executors/driver-20180409110035-0004/runs/8ac20902-74e1-45c4-9ab6-c52a79940189/spark-examples_2.11-2.3.0.jar'
      I0409 11:00:35.366878 22726 fetcher.cpp:608] Successfully fetched all URIs into '/var/lib/mesos/slaves/0262246c-14a3-4408-9b74-5e3b65dc1344-S0/frameworks/edff1a6f-38c6-46e0-a3c1-62a8fbfc2b5d-0014/executors/driver-20180409110035-0004/runs/8ac20902-74e1-45c4-9ab6-c52a79940189'
      I0409 11:00:35.438725 22733 exec.cpp:162] Version: 1.5.0
      I0409 11:00:35.440770 22734 exec.cpp:236] Executor registered on agent 0262246c-14a3-4408-9b74-5e3b65dc1344-S0
      I0409 11:00:35.441388 22733 executor.cpp:171] Received SUBSCRIBED event
      I0409 11:00:35.441586 22733 executor.cpp:175] Subscribed executor on tiboun-Dell-Precision-M3800
      I0409 11:00:35.441643 22733 executor.cpp:171] Received LAUNCH event
      I0409 11:00:35.441767 22733 executor.cpp:638] Starting task driver-20180409110035-0004
      I0409 11:00:35.445050 22733 executor.cpp:478] Running '/usr/libexec/mesos/mesos-containerizer launch <POSSIBLY-SENSITIVE-DATA>'
      I0409 11:00:35.445770 22733 executor.cpp:651] Forked command at 22743
      sh: 1: Syntax error: "(" unexpected
      I0409 11:00:35.538661 22736 executor.cpp:938] Command exited with status 2 (pid: 22743)
      I0409 11:00:36.541016 22739 process.cpp:887] Failed to accept socket: future discarded
      

      If you remove the parentheses, you get the following result:

       

      I0409 11:03:02.023701 23085 fetcher.cpp:551] Fetcher Info: {"cache_directory":"\/tmp\/mesos\/fetch\/tiboun","items":[{"action":"BYPASS_CACHE","uri":{"cache":false,"extract":true,"value":"\/home\/tiboun\/tools\/spark\/examples\/jars\/spark-examples_2.11-2.3.0.jar"}}],"sandbox_directory":"\/var\/lib\/mesos\/slaves\/0262246c-14a3-4408-9b74-5e3b65dc1344-S0\/frameworks\/edff1a6f-38c6-46e0-a3c1-62a8fbfc2b5d-0014\/executors\/driver-20180409110301-0006\/runs\/f887c0ab-b48f-4382-850c-383c1c944269","user":"tiboun"}
      I0409 11:03:02.028268 23085 fetcher.cpp:450] Fetching URI '/home/tiboun/tools/spark/examples/jars/spark-examples_2.11-2.3.0.jar'
      I0409 11:03:02.028302 23085 fetcher.cpp:291] Fetching directly into the sandbox directory
      I0409 11:03:02.028336 23085 fetcher.cpp:225] Fetching URI '/home/tiboun/tools/spark/examples/jars/spark-examples_2.11-2.3.0.jar'
      W0409 11:03:02.031209 23085 fetcher.cpp:330] Copying instead of extracting resource from URI with 'extract' flag, because it does not seem to be an archive: /home/tiboun/tools/spark/examples/jars/spark-examples_2.11-2.3.0.jar
      I0409 11:03:02.031250 23085 fetcher.cpp:603] Fetched '/home/tiboun/tools/spark/examples/jars/spark-examples_2.11-2.3.0.jar' to '/var/lib/mesos/slaves/0262246c-14a3-4408-9b74-5e3b65dc1344-S0/frameworks/edff1a6f-38c6-46e0-a3c1-62a8fbfc2b5d-0014/executors/driver-20180409110301-0006/runs/f887c0ab-b48f-4382-850c-383c1c944269/spark-examples_2.11-2.3.0.jar'
      I0409 11:03:02.031258 23085 fetcher.cpp:608] Successfully fetched all URIs into '/var/lib/mesos/slaves/0262246c-14a3-4408-9b74-5e3b65dc1344-S0/frameworks/edff1a6f-38c6-46e0-a3c1-62a8fbfc2b5d-0014/executors/driver-20180409110301-0006/runs/f887c0ab-b48f-4382-850c-383c1c944269'
      I0409 11:03:02.090797 23095 exec.cpp:162] Version: 1.5.0
      I0409 11:03:02.095283 23092 exec.cpp:236] Executor registered on agent 0262246c-14a3-4408-9b74-5e3b65dc1344-S0
      I0409 11:03:02.096693 23095 executor.cpp:171] Received SUBSCRIBED event
      I0409 11:03:02.097040 23095 executor.cpp:175] Subscribed executor on tiboun-Dell-Precision-M3800
      I0409 11:03:02.097141 23095 executor.cpp:171] Received LAUNCH event
      I0409 11:03:02.097357 23095 executor.cpp:638] Starting task driver-20180409110301-0006
      I0409 11:03:02.101521 23095 executor.cpp:478] Running '/usr/libexec/mesos/mesos-containerizer launch <POSSIBLY-SENSITIVE-DATA>'
      I0409 11:03:02.102332 23095 executor.cpp:651] Forked command at 23100
      Error: Cannot load main class from JAR file:/var/lib/mesos/slaves/0262246c-14a3-4408-9b74-5e3b65dc1344-S0/frameworks/edff1a6f-38c6-46e0-a3c1-62a8fbfc2b5d-0014/executors/driver-20180409110301-0006/runs/f887c0ab-b48f-4382-850c-383c1c944269/favorite
      Run with --help for usage help or --verbose for debug output
      I0409 11:03:02.792325 23090 executor.cpp:938] Command exited with status 1 (pid: 23100)
      I0409 11:03:03.794505 23098 process.cpp:887] Failed to accept socket: future discarded
      

      Interesting things is that mesos try to find main class on a file called "favorite" which is part of the task name.

       

      I've tried to launch spark-shell with the same name and it works fine. Task name's get driver's name and add a sequence after it.

       

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                tiboun bounkong khamphousone
                Reporter:
                tiboun bounkong khamphousone
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: