Uploaded image for project: 'Zeppelin'
  1. Zeppelin
  2. ZEPPELIN-194

Exception on %sql over CSV dataframe

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Reopened
    • Major
    • Resolution: Unresolved
    • 0.6.0, 0.6.1, 0.6.2
    • None
    • None
    • None

    Description

      In local spark exception happens on %sql select * from ... limit 10 but z.show(df) for same dataset shows well.

      Steps to reproduce:

      %dep
      z.load("com.databricks:spark-csv_2.10:1.1.0")
      
      %sh
      wget https://s3.amazonaws.com/apache-zeppelin/tutorial/bank/bank.csv
      
      %spark
      val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").load("bank.csv")
      df.registerTempTable("bank")
      
      z.show(df)
      
      %sql select * from bank limit 10
      

      Exception:

      INFO [2015-08-01 18:06:38,986] ({pool-2-thread-3} Logging.scala[logInfo]:59) - Created broadcast 2 from textFile at CsvRelation.scala:66
      ERROR [2015-08-01 18:06:39,002] ({pool-2-thread-3} Job.java[run]:183) - Job failed
      org.apache.zeppelin.interpreter.InterpreterException: java.lang.reflect.InvocationTargetException
      	at org.apache.zeppelin.spark.ZeppelinContext.showRDD(ZeppelinContext.java:301)
      	at org.apache.zeppelin.spark.SparkSqlInterpreter.interpret(SparkSqlInterpreter.java:134)
      	at org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:57)
      	at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93)
      	at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:276)
      	at org.apache.zeppelin.scheduler.Job.run(Job.java:170)
      	at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:118)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
      	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at java.lang.Thread.run(Thread.java:744)
      Caused by: java.lang.reflect.InvocationTargetException
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:606)
      	at org.apache.zeppelin.spark.ZeppelinContext.showRDD(ZeppelinContext.java:296)
      	... 13 more
      Caused by: java.lang.ClassNotFoundException: com.databricks.spark.csv.CsvRelation$$anonfun$buildScan$1$$anonfun$1
      	at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
      	at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
      	at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
      	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
      	at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
      	at java.lang.Class.forName0(Native Method)
      	at java.lang.Class.forName(Class.java:270)
      	at org.apache.spark.util.InnerClosureFinder$$anon$4.visitMethodInsn(ClosureCleaner.scala:455)
      	at com.esotericsoftware.reflectasm.shaded.org.objectweb.asm.ClassReader.accept(Unknown Source)
      	at com.esotericsoftware.reflectasm.shaded.org.objectweb.asm.ClassReader.accept(Unknown Source)
      	at org.apache.spark.util.ClosureCleaner$.getInnerClosureClasses(ClosureCleaner.scala:101)
      	at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:197)
      	at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:132)
      	at org.apache.spark.SparkContext.clean(SparkContext.scala:1893)
      	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:683)
      	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:682)
      	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
      	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
      	at org.apache.spark.rdd.RDD.withScope(RDD.scala:286)
      	at org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:682)
      	at com.databricks.spark.csv.CsvRelation.buildScan(CsvRelation.scala:83)
      	at org.apache.spark.sql.sources.DataSourceStrategy$.apply(DataSourceStrategy.scala:101)
      	at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
      	at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
      	at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
      	at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
      	at org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
      	at org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:314)
      	at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
      	at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
      	at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
      	at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
      	at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:943)
      	at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:941)
      	at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:947)
      	at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:947)
      	at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:1269)
      	at org.apache.spark.sql.DataFrame.head(DataFrame.scala:1203)
      	at org.apache.spark.sql.DataFrame.take(DataFrame.scala:1262)
      	... 18 more
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              bzz Alexander Bezzubov
              Votes:
              2 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated: