[ZEPPELIN-3727] Spark commands execute correctly, but log extreme number of errors - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 0.7.3
Fix Version/s: None
Component/s: Interpreters
Labels:
None

Description

I'm running EMR 5.16.0 on AWS. If I try to run any Spark SQL queries against my RDBMS using the Scala interpreter, they seem to execute just fine, however the log file fills with this exception over and over again:

ERROR [2018-08-16 22:04:36,601] ({pool-2-thread-2} SparkInterpreter.java[getProgressFromStage_1_1x]:1503) - Error on getting progress information 
java.lang.NoSuchMethodException: org.apache.zeppelin.spark.SparkInterpreter$1.stageIdToData() 
       at java.lang.Class.getMethod(Class.java:1786) 
       at org.apache.zeppelin.spark.SparkInterpreter.getProgressFromStage_1_1x(SparkInterpreter.java:1487) 
       at org.apache.zeppelin.spark.SparkInterpreter.getProgressFromStage_1_1x(SparkInterpreter.java:1510) 
       at org.apache.zeppelin.spark.SparkInterpreter.getProgress(SparkInterpreter.java:1430) 
       at org.apache.zeppelin.interpreter.LazyOpenInterpreter.getProgress(LazyOpenInterpreter.java:117) 
       at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.getProgress(RemoteInterpreterServer.java:555) 
       at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$getProgress.getResult(RemoteInterpreterService.java:1762) 
       at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$getProgress.getResult(RemoteInterpreterService.java:1747) 
       at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) 
       at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) 
       at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285) 
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
       at java.lang.Thread.run(Thread.java:748)

This simple code will trigger it (hitting my own database), though I'm not convinced it has anything to do with Spark SQL, but instead with long running commands.

import org.apache.spark.sql._

val dbConnectionMap = Map(
"url" -> "<redacted>",
"driver" -> "com.mysql.jdbc.Driver"
)

val sql = """(select item_name from product_catalog) as product_catalog"""
val products = spark.read.format("jdbc").options(dbConnectionMap + ("dbtable" -> sql)).load.cache

products.count

This wouldn't be a big concern since the execution works, except that after a couple hours of analyzing data, I started getting file system errors. It turned out to be caused by the log file taking up all the hard drive space, 33GB!

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Tim Gautier

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 16/Aug/18 22:18

Updated:: 19/Jan/19 01:19