Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Workaround
-
None
-
None
Description
Hi,
So I have been working through attempting to set up a three-node installation of spot on top of Cloudera, however, it seems that when running the spark-submit task from worker.py in spot-ingest/pipelines/proxy/worker.py the spark runner cannot find the kafka/common/TopicAndPartition class, so then begins failing in a massive way (26k lines of errors as each of the threads die)
I have ensured the spark-streaming-kafka-0-8-assembly_2.11-2.0.0.jar is available (also please see my pull request on Github RE documentation confusion), and also checked that the class does exist in the jar file I am using, along with trying multiple versions of the kafka jar file, however it still errors.
I have pulled out the spark-submit command and run it separately and locally and it outputs the following: (<user> and <ipaddress> redacted)
UPDATE re Docker image: the image doesn't have the ingest components in it? So turns out I can't test with that.
Any advice would be much appreciated.
Many thanks,
Rob
(apologies, multiline preformatted styling doesn't seem to work)
{{
<user>@spot-ingest:~/spot-ingest$ spark-submit --master local[2] --driver-memory 4000M --num-executors 1 --conf spark.executor.memory=4000M --conf spark.executor.cores=4 --jars /home/<user>/spot-ingest/common/spark-streaming-kafka-0-8-assembly_2.10-2.1.1.jar --driver-class-path /home/<user>/spot-ingest/common/spark-streaming-kafka-0-8-assembly_2.11.jar,/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/jars/kafka_2.10-0.9.0-kafka-2.0.2.jar /home/<user>/spot-ingest/pipelines/proxy/bluecoat.py -zk <ipaddress>:2181 -t SPOT-INGEST-proxy-13_30_45 -db spotdb -dt proxy -w 1 -bs 20
17/07/12 09:35:23 INFO spark.SparkContext: Running Spark version 1.6.0
17/07/12 09:35:24 INFO spark.SecurityManager: Changing view acls to: <user>
17/07/12 09:35:24 INFO spark.SecurityManager: Changing modify acls to: <user>
17/07/12 09:35:24 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(<user>); users with modify permissions: Set(<user>)
17/07/12 09:35:24 INFO util.Utils: Successfully started service 'sparkDriver' on port 32918.
17/07/12 09:35:24 INFO slf4j.Slf4jLogger: Slf4jLogger started
17/07/12 09:35:24 INFO Remoting: Starting remoting
17/07/12 09:35:25 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@<ipaddress>:41872]
17/07/12 09:35:25 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriverActorSystem@<ipaddress>:41872]
17/07/12 09:35:25 INFO util.Utils: Successfully started service 'sparkDriverActorSystem' on port 41872.
17/07/12 09:35:25 INFO spark.SparkEnv: Registering MapOutputTracker
17/07/12 09:35:25 INFO spark.SparkEnv: Registering BlockManagerMaster
17/07/12 09:35:25 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-a6617e16-f65c-4c1e-8954-2654d4428f2d
17/07/12 09:35:25 INFO storage.MemoryStore: MemoryStore started with capacity 2.0 GB
17/07/12 09:35:25 INFO spark.SparkEnv: Registering OutputCommitCoordinator
17/07/12 09:35:25 INFO server.Server: jetty-8.y.z-SNAPSHOT
17/07/12 09:35:25 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
17/07/12 09:35:25 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
17/07/12 09:35:25 INFO ui.SparkUI: Started SparkUI at http://<ipaddress>:4040
17/07/12 09:35:25 INFO spark.SparkContext: Added JAR file:/home/<user>/spot-ingest/common/spark-streaming-kafka-0-8-assembly_2.10-2.1.1.jar at spark://<ipaddress>:32918/jars/spark-streaming-kafka-0-8-assembly_2.10-2.1.1.jar with timestamp 1499848525416
17/07/12 09:35:25 INFO util.Utils: Copying /home/<user>/spot-ingest/pipelines/proxy/bluecoat.py to /tmp/spark-81869c29-772d-4bbf-a53a-3d8556418094/userFiles-0dd85ff4-063c-489b-b6ef-6ea15af4b9cf/bluecoat.py
17/07/12 09:35:25 INFO spark.SparkContext: Added file file:/home/<user>/spot-ingest/pipelines/proxy/bluecoat.py at file:/home/<user>/spot-ingest/pipelines/proxy/bluecoat.py with timestamp 1499848525700
17/07/12 09:35:25 INFO executor.Executor: Starting executor ID driver on host localhost
17/07/12 09:35:25 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 42401.
17/07/12 09:35:25 INFO netty.NettyBlockTransferService: Server created on 42401
17/07/12 09:35:25 INFO storage.BlockManagerMaster: Trying to register BlockManager
17/07/12 09:35:25 INFO storage.BlockManagerMasterEndpoint: Registering block manager localhost:42401 with 2.0 GB RAM, BlockManagerId(driver, localhost, 42401)
17/07/12 09:35:25 INFO storage.BlockManagerMaster: Registered BlockManager
Exception in thread "Thread-3" java.lang.NoClassDefFoundError: kafka/common/TopicAndPartition
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2570)
at java.lang.Class.privateGetPublicMethods(Class.java:2690)
at java.lang.Class.getMethods(Class.java:1467)
at py4j.reflection.ReflectionEngine.getMethodsByNameAndLength(ReflectionEngine.java:367)
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:319)
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:344)
at py4j.Gateway.invoke(Gateway.java:252)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:209)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: kafka.common.TopicAndPartition
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 12 more
Exception in thread "Thread-23" java.lang.NoClassDefFoundError: kafka/common/TopicAndPartition
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2570)
at java.lang.Class.privateGetPublicMethods(Class.java:2690)
at java.lang.Class.getMethods(Class.java:1467)
at py4j.reflection.ReflectionEngine.getMethodsByNameAndLength(ReflectionEngine.java:367)
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:319)
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:344)
at py4j.Gateway.invoke(Gateway.java:252)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:209)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: kafka.common.TopicAndPartition
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 12 more
ERROR:py4j.java_gateway:Error while sending or receiving.
Traceback (most recent call last):
File "/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 746, in send_command
raise Py4JError("Answer from Java side is empty")
Py4JError: Answer from Java side is empty
ERROR:py4j.java_gateway:Error while sending or receiving.
Traceback (most recent call last):
File "/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 746, in send_command
raise Py4JError("Answer from Java side is empty")
Py4JError: Answer from Java side is empty
Exception in thread "Thread-24" java.lang.NoClassDefFoundError: kafka/common/TopicAndPartition
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2570)
at java.lang.Class.privateGetPublicMethods(Class.java:2690)
at java.lang.Class.getMethods(Class.java:1467)
at py4j.reflection.ReflectionEngine.getMethodsByNameAndLength(ReflectionEngine.java:367)
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:319)ERROR:py4j.java_gateway:Error while sending or receiving.
Traceback (most recent call last):
File "/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 746, in send_command
raise Py4JError("Answer from Java side is empty")
Py4JError: Answer from Java side is empty
}}
Attachments
Issue Links
- relates to
-
SPOT-206 [Documentation] Apache Spot web site indicates Spark 1.6.0 is required but we are actually working with 2.1.0
- Open