Uploaded image for project: 'Zeppelin'
  1. Zeppelin
  2. ZEPPELIN-2536

Intermitten Classloading error running SQL in Zeppelin

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      This error happens sometimes when I run a sql paragrpah with SparkSQL via native Spark interpreter. The error goes away on the next run but shows up again randomly.

      Reproduced with Zeppelin version 0.7.1

      In first paragraph

      %spark
      val textFile = spark.read.textFile("path to any csv file")
      val linesDF = textFile.toDF("line")
      val wordsDF = linesDF.explode("line","word")((line: String) => line.split(" "))
      val wordCountDF = wordsDF.groupBy("word").count()
      wordCountDF.registerTempTable("words")

      In second paragraph

      %spark.sql
      select * from words where word not in ('-', 'to', 'the','Feature', 'in', 'and','for', 'Core', 'not','Zeppelin', 'with', 'is','Cause', 'Minor', 'a','Feature', 'on', 'from', 'SOLUTION:', 'customer', 'of','this', 'as', 'user','RESOLUTION', 'Core', 'not','Zeppelin', 'with', 'is','Cause', 'Minor', 'a','Feature', 'on', 'from') order by count desc limit 100

      java.lang.NoSuchMethodException: org.apache.spark.io.LZ4CompressionCodec.<init>(org.apache.spark.SparkConf)
      at java.lang.Class.getConstructor0(Class.java:3082)
      at java.lang.Class.getConstructor(Class.java:1825)
      at org.apache.spark.io.CompressionCodec$.createCodec(CompressionCodec.scala:71)
      at org.apache.spark.io.CompressionCodec$.createCodec(CompressionCodec.scala:65)
      at org.apache.spark.sql.execution.SparkPlan.org$apache$spark$sql$execution$SparkPlan$$decodeUnsafeRows(SparkPlan.scala:250)
      at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeTake$1.apply(SparkPlan.scala:336)
      at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeTake$1.apply(SparkPlan.scala:336)
      at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
      at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
      at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
      at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
      at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
      at scala.collection.mutable.ArrayOps$ofRef.flatMap(ArrayOps.scala:186)
      at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:336)
      at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38)
      at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1.apply(Dataset.scala:2371)
      at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
      at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2765)
      at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2370)
      at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collect(Dataset.scala:2377)
      at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2113)
      at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2112)
      at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2795)
      at org.apache.spark.sql.Dataset.head(Dataset.scala:2112)
      at org.apache.spark.sql.Dataset.take(Dataset.scala:2327)
      at sun.reflect.GeneratedMethodAccessor106.invoke(Unknown Source)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:498)
      at org.apache.zeppelin.spark.ZeppelinContext.showDF(ZeppelinContext.java:235)
      at org.apache.zeppelin.spark.SparkSqlInterpreter.interpret(SparkSqlInterpreter.java:130)
      at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:95)
      at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:490)
      at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
      at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      at java.lang.Thread.run(Thread.java:745)
      ERROR

      From Zeppelin interpreter log

      org.apache.zeppelin.interpreter.InterpreterException: java.lang.reflect.InvocationTargetException
      at org.apache.zeppelin.spark.ZeppelinContext.showDF(ZeppelinContext.java:239)
      at org.apache.zeppelin.spark.SparkSqlInterpreter.interpret(SparkSqlInterpreter.java:130)
      at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:95)
      at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:490)
      at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
      at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      at java.lang.Thread.run(Thread.java:745)
      Caused by: java.lang.reflect.InvocationTargetException
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:498)
      at org.apache.zeppelin.spark.ZeppelinContext.showDF(ZeppelinContext.java:235)
      ... 12 more
      Caused by: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
      Exchange hashpartitioning(word#179, 200)
      +- *HashAggregate(keys=word#179, functions=[partial_count(1)], output=word#179, count#200L)
      +- *Filter NOT word#179 INSET (this,of,from,with,Zeppelin,user,SOLUTION:,Cause,is,and,a,as,RESOLUTION,Core,to,the,customer,Feature,not,-,on,in,Minor,for)
      +- Generate UserDefinedGenerator(line#175), false, false, word#179
      +- *Project value#168 AS line#175
      +- *FileScan text value#168 Batched: false, Format: Text, Location: InMemoryFileIndexfile:/Users/vshukla/Documents/ZeppelinAllTickets.csv, PartitionFilters: [], PushedFilters: [], ReadSchema: struct<value:string>

      at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56)
      at org.apache.spark.sql.execution.exchange.ShuffleExchange.doExecute(ShuffleExchange.scala:112)
      at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
      at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
      at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
      at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
      at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
      at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
      at org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:235)
      at org.apache.spark.sql.execution.aggregate.HashAggregateExec.inputRDDs(HashAggregateExec.scala:141)
      at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:368)
      at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
      at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
      at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
      at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
      at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
      at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
      at org.apache.spark.sql.execution.TakeOrderedAndProjectExec.executeCollect(limit.scala:133)
      at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1.apply(Dataset.scala:2371)
      at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
      at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2765)
      at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2370)
      at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collect(Dataset.scala:2377)
      at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2113)
      at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2112)
      at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2795)
      at org.apache.spark.sql.Dataset.head(Dataset.scala:2112)
      at org.apache.spark.sql.Dataset.take(Dataset.scala:2327)
      ... 17 more
      Caused by: java.lang.NoSuchMethodException: org.apache.spark.io.LZ4CompressionCodec.<init>(org.apache.spark.SparkConf)
      at java.lang.Class.getConstructor0(Class.java:3082)
      at java.lang.Class.getConstructor(Class.java:1825)
      at org.apache.spark.io.CompressionCodec$.createCodec(CompressionCodec.scala:71)
      at org.apache.spark.io.CompressionCodec$.createCodec(CompressionCodec.scala:65)
      at org.apache.spark.broadcast.TorrentBroadcast.org$apache$spark$broadcast$TorrentBroadcast$$setConf(TorrentBroadcast.scala:75)
      at org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:83)
      at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34)
      at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:56)
      at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1411)
      at org.apache.spark.sql.execution.datasources.text.TextFileFormat.buildReader(TextFileFormat.scala:105)
      at org.apache.spark.sql.execution.datasources.FileFormat$class.buildReaderWithPartitionValues(FileFormat.scala:119)
      at org.apache.spark.sql.execution.datasources.TextBasedFileFormat.buildReaderWithPartitionValues(FileFormat.scala:150)
      at org.apache.spark.sql.execution.FileSourceScanExec.inputRDD$lzycompute(DataSourceScanExec.scala:253)
      at org.apache.spark.sql.execution.FileSourceScanExec.inputRDD(DataSourceScanExec.scala:251)
      at org.apache.spark.sql.execution.FileSourceScanExec.inputRDDs(DataSourceScanExec.scala:271)
      at org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:42)
      at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:368)
      at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
      at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
      at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
      at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
      at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
      at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
      at org.apache.spark.sql.execution.GenerateExec.doExecute(GenerateExec.scala:100)
      at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
      at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
      at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
      at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
      at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
      at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
      at org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:235)
      at org.apache.spark.sql.execution.FilterExec.inputRDDs(basicPhysicalOperators.scala:124)
      at org.apache.spark.sql.execution.aggregate.HashAggregateExec.inputRDDs(HashAggregateExec.scala:141)
      at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:368)
      at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
      at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
      at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
      at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
      at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
      at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
      at org.apache.spark.sql.execution.exchange.ShuffleExchange.prepareShuffleDependency(ShuffleExchange.scala:85)
      at org.apache.spark.sql.execution.exchange.ShuffleExchange$$anonfun$doExecute$1.apply(ShuffleExchange.scala:121)
      at org.apache.spark.sql.execution.exchange.ShuffleExchange$$anonfun$doExecute$1.apply(ShuffleExchange.scala:112)
      at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52)
      ... 44 more
      INFO [2017-05-11 16:41:47,216] (

      {pool-2-thread-5}

      SchedulerFactory.java[jobFinished]:137) - Job remoteInterpretJob_1494546107182 finished by scheduler org.apache.zeppelin.spark.SparkInterpreter1414790412

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              vinayshukla@gmail.com Vinay Shukla
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: