[SPARK-27264] spark sql released all executor but the job is not done - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Question
Status: Resolved
Priority: Major
Resolution: Invalid
Affects Version/s: 2.4.0
Fix Version/s: None
Component/s: SQL
Labels:
None
Environment:

Azure HDinsight spark 2.4 on Azure storage SQL: Read and Join some data and finally write result to a Hive metastore; query executed on jupyterhub; while the pre-migration cluster is a jupyter (non-hub)

Description

I have a spark sql that used to execute < 10 mins now running at 3 hours after a cluster migration and need to deep dive on what it's actually doing. I'm new to spark and please don't mind if I'm asking something unrelated.

Increased spark.executor.memory but no luck. Env: Azure HDinsight spark 2.4 on Azure storage SQL: Read and Join some data and finally write result to a Hive metastore

The sparl.sql ends with below code: .write.mode("overwrite").saveAsTable("default.mikemiketable")

Application Behavior: Within the first 15 mins, it loads and complete most tasks (199/200); left only 1 executor process alive and continually to shuffle read / write data. Because now it only leave 1 executor, we need to wait 3 hours until this application finish.

Left only 1 executor alive

Not sure what's the executor doing:

From time to time, we can tell the shuffle read increased:

Therefore I increased the spark.executor.memory to 20g, but nothing changed. From Ambari and YARN I can tell the cluster has many resources left.

Release of almost all executor

Any guidance is greatly appreciated.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Mike Chan

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 24/Mar/19 15:34

Updated:: 26/Mar/19 06:58

Resolved:: 26/Mar/19 06:58