Uploaded image for project: 'Zeppelin'
  1. Zeppelin
  2. ZEPPELIN-3727

Spark commands execute correctly, but log extreme number of errors

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.7.3
    • None
    • Interpreters
    • None

    Description

      I'm running EMR 5.16.0 on AWS. If I try to run any Spark SQL queries against my RDBMS using the Scala interpreter, they seem to execute just fine, however the log file fills with this exception over and over again:

      ERROR [2018-08-16 22:04:36,601] ({pool-2-thread-2} SparkInterpreter.java[getProgressFromStage_1_1x]:1503) - Error on getting progress information 
      java.lang.NoSuchMethodException: org.apache.zeppelin.spark.SparkInterpreter$1.stageIdToData() 
             at java.lang.Class.getMethod(Class.java:1786) 
             at org.apache.zeppelin.spark.SparkInterpreter.getProgressFromStage_1_1x(SparkInterpreter.java:1487) 
             at org.apache.zeppelin.spark.SparkInterpreter.getProgressFromStage_1_1x(SparkInterpreter.java:1510) 
             at org.apache.zeppelin.spark.SparkInterpreter.getProgress(SparkInterpreter.java:1430) 
             at org.apache.zeppelin.interpreter.LazyOpenInterpreter.getProgress(LazyOpenInterpreter.java:117) 
             at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.getProgress(RemoteInterpreterServer.java:555) 
             at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$getProgress.getResult(RemoteInterpreterService.java:1762) 
             at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$getProgress.getResult(RemoteInterpreterService.java:1747) 
             at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) 
             at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) 
             at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285) 
             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
             at java.lang.Thread.run(Thread.java:748)
      

      This simple code will trigger it (hitting my own database), though I'm not convinced it has anything to do with Spark SQL, but instead with long running commands.

      import org.apache.spark.sql._
      
      val dbConnectionMap = Map(
      "url" -> "<redacted>",
      "driver" -> "com.mysql.jdbc.Driver"
      )
      
      val sql = """(select item_name from product_catalog) as product_catalog"""
      val products = spark.read.format("jdbc").options(dbConnectionMap + ("dbtable" -> sql)).load.cache
      
      products.count
      

      This wouldn't be a big concern since the execution works, except that after a couple hours of analyzing data, I started getting file system errors. It turned out to be caused by the log file taking up all the hard drive space, 33GB!

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            timgautier Tim Gautier
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: