[SPARK-30444] The same job will be computated for many times when using Dataset.show() - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 3.1.0
Fix Version/s: None
Component/s: SQL
Labels:
None

Description

When I run the example sql.SparkSQLExample, df.show() at line 60 would trigger an action. On WebUI, I noticed that this API creates 5 jobs, all of which have the same lineage graph with the same RDDs and the same call stacks. That means Spark recomputates the job for 5 times. But strangely, sqlDF.show() at line 123 only creates 1 job.

I don't know what happened at show() at line 60.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: IcySanwitch

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 07/Jan/20 01:55

Updated:: 12/Dec/22 18:11