Uploaded image for project: 'Spot'
  1. Spot
  2. SPOT-200

[Ingest] Unable to find class in kafka streaming jar when submitted to spark with -jar

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Workaround
    • Affects Version/s: None
    • Fix Version/s: None
    • Environment:

      Description

      Hi,

      So I have been working through attempting to set up a three-node installation of spot on top of Cloudera, however, it seems that when running the spark-submit task from worker.py in spot-ingest/pipelines/proxy/worker.py the spark runner cannot find the kafka/common/TopicAndPartition class, so then begins failing in a massive way (26k lines of errors as each of the threads die)

      I have ensured the spark-streaming-kafka-0-8-assembly_2.11-2.0.0.jar is available (also please see my pull request on Github RE documentation confusion), and also checked that the class does exist in the jar file I am using, along with trying multiple versions of the kafka jar file, however it still errors.

      I have pulled out the spark-submit command and run it separately and locally and it outputs the following: (<user> and <ipaddress> redacted)

      UPDATE re Docker image: the image doesn't have the ingest components in it? So turns out I can't test with that.

      Any advice would be much appreciated.

      Many thanks,
      Rob

      (apologies, multiline preformatted styling doesn't seem to work)
      {{
      <user>@spot-ingest:~/spot-ingest$ spark-submit --master local[2] --driver-memory 4000M --num-executors 1 --conf spark.executor.memory=4000M --conf spark.executor.cores=4 --jars /home/<user>/spot-ingest/common/spark-streaming-kafka-0-8-assembly_2.10-2.1.1.jar --driver-class-path /home/<user>/spot-ingest/common/spark-streaming-kafka-0-8-assembly_2.11.jar,/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/jars/kafka_2.10-0.9.0-kafka-2.0.2.jar /home/<user>/spot-ingest/pipelines/proxy/bluecoat.py -zk <ipaddress>:2181 -t SPOT-INGEST-proxy-13_30_45 -db spotdb -dt proxy -w 1 -bs 20
      17/07/12 09:35:23 INFO spark.SparkContext: Running Spark version 1.6.0
      17/07/12 09:35:24 INFO spark.SecurityManager: Changing view acls to: <user>
      17/07/12 09:35:24 INFO spark.SecurityManager: Changing modify acls to: <user>
      17/07/12 09:35:24 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(<user>); users with modify permissions: Set(<user>)
      17/07/12 09:35:24 INFO util.Utils: Successfully started service 'sparkDriver' on port 32918.
      17/07/12 09:35:24 INFO slf4j.Slf4jLogger: Slf4jLogger started
      17/07/12 09:35:24 INFO Remoting: Starting remoting
      17/07/12 09:35:25 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@<ipaddress>:41872]
      17/07/12 09:35:25 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriverActorSystem@<ipaddress>:41872]
      17/07/12 09:35:25 INFO util.Utils: Successfully started service 'sparkDriverActorSystem' on port 41872.
      17/07/12 09:35:25 INFO spark.SparkEnv: Registering MapOutputTracker
      17/07/12 09:35:25 INFO spark.SparkEnv: Registering BlockManagerMaster
      17/07/12 09:35:25 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-a6617e16-f65c-4c1e-8954-2654d4428f2d
      17/07/12 09:35:25 INFO storage.MemoryStore: MemoryStore started with capacity 2.0 GB
      17/07/12 09:35:25 INFO spark.SparkEnv: Registering OutputCommitCoordinator
      17/07/12 09:35:25 INFO server.Server: jetty-8.y.z-SNAPSHOT
      17/07/12 09:35:25 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
      17/07/12 09:35:25 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
      17/07/12 09:35:25 INFO ui.SparkUI: Started SparkUI at http://<ipaddress>:4040
      17/07/12 09:35:25 INFO spark.SparkContext: Added JAR file:/home/<user>/spot-ingest/common/spark-streaming-kafka-0-8-assembly_2.10-2.1.1.jar at spark://<ipaddress>:32918/jars/spark-streaming-kafka-0-8-assembly_2.10-2.1.1.jar with timestamp 1499848525416
      17/07/12 09:35:25 INFO util.Utils: Copying /home/<user>/spot-ingest/pipelines/proxy/bluecoat.py to /tmp/spark-81869c29-772d-4bbf-a53a-3d8556418094/userFiles-0dd85ff4-063c-489b-b6ef-6ea15af4b9cf/bluecoat.py
      17/07/12 09:35:25 INFO spark.SparkContext: Added file file:/home/<user>/spot-ingest/pipelines/proxy/bluecoat.py at file:/home/<user>/spot-ingest/pipelines/proxy/bluecoat.py with timestamp 1499848525700
      17/07/12 09:35:25 INFO executor.Executor: Starting executor ID driver on host localhost
      17/07/12 09:35:25 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 42401.
      17/07/12 09:35:25 INFO netty.NettyBlockTransferService: Server created on 42401
      17/07/12 09:35:25 INFO storage.BlockManagerMaster: Trying to register BlockManager
      17/07/12 09:35:25 INFO storage.BlockManagerMasterEndpoint: Registering block manager localhost:42401 with 2.0 GB RAM, BlockManagerId(driver, localhost, 42401)
      17/07/12 09:35:25 INFO storage.BlockManagerMaster: Registered BlockManager
      Exception in thread "Thread-3" java.lang.NoClassDefFoundError: kafka/common/TopicAndPartition
      at java.lang.Class.getDeclaredMethods0(Native Method)
      at java.lang.Class.privateGetDeclaredMethods(Class.java:2570)
      at java.lang.Class.privateGetPublicMethods(Class.java:2690)
      at java.lang.Class.getMethods(Class.java:1467)
      at py4j.reflection.ReflectionEngine.getMethodsByNameAndLength(ReflectionEngine.java:367)
      at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:319)
      at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:344)
      at py4j.Gateway.invoke(Gateway.java:252)
      at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
      at py4j.commands.CallCommand.execute(CallCommand.java:79)
      at py4j.GatewayConnection.run(GatewayConnection.java:209)
      at java.lang.Thread.run(Thread.java:745)
      Caused by: java.lang.ClassNotFoundException: kafka.common.TopicAndPartition
      at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
      at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
      at java.security.AccessController.doPrivileged(Native Method)
      at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
      at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
      at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
      at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
      ... 12 more
      Exception in thread "Thread-23" java.lang.NoClassDefFoundError: kafka/common/TopicAndPartition
      at java.lang.Class.getDeclaredMethods0(Native Method)
      at java.lang.Class.privateGetDeclaredMethods(Class.java:2570)
      at java.lang.Class.privateGetPublicMethods(Class.java:2690)
      at java.lang.Class.getMethods(Class.java:1467)
      at py4j.reflection.ReflectionEngine.getMethodsByNameAndLength(ReflectionEngine.java:367)
      at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:319)
      at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:344)
      at py4j.Gateway.invoke(Gateway.java:252)
      at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
      at py4j.commands.CallCommand.execute(CallCommand.java:79)
      at py4j.GatewayConnection.run(GatewayConnection.java:209)
      at java.lang.Thread.run(Thread.java:745)
      Caused by: java.lang.ClassNotFoundException: kafka.common.TopicAndPartition
      at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
      at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
      at java.security.AccessController.doPrivileged(Native Method)
      at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
      at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
      at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
      at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
      ... 12 more
      ERROR:py4j.java_gateway:Error while sending or receiving.
      Traceback (most recent call last):
      File "/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 746, in send_command
      raise Py4JError("Answer from Java side is empty")
      Py4JError: Answer from Java side is empty
      ERROR:py4j.java_gateway:Error while sending or receiving.
      Traceback (most recent call last):
      File "/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 746, in send_command
      raise Py4JError("Answer from Java side is empty")
      Py4JError: Answer from Java side is empty
      Exception in thread "Thread-24" java.lang.NoClassDefFoundError: kafka/common/TopicAndPartition
      at java.lang.Class.getDeclaredMethods0(Native Method)
      at java.lang.Class.privateGetDeclaredMethods(Class.java:2570)
      at java.lang.Class.privateGetPublicMethods(Class.java:2690)
      at java.lang.Class.getMethods(Class.java:1467)
      at py4j.reflection.ReflectionEngine.getMethodsByNameAndLength(ReflectionEngine.java:367)
      at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:319)ERROR:py4j.java_gateway:Error while sending or receiving.
      Traceback (most recent call last):
      File "/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 746, in send_command
      raise Py4JError("Answer from Java side is empty")
      Py4JError: Answer from Java side is empty
      }}

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                rphi Rob Phipps
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: