Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-30444

The same job will be computated for many times when using Dataset.show()

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.1.0
    • None
    • SQL
    • None

    Description

      When I run the example sql.SparkSQLExample, df.show() at line 60 would trigger an action. On WebUI, I noticed that this API creates 5 jobs, all of which have the same lineage graph with the same RDDs and the same call stacks. That means Spark recomputates the job for 5 times. But strangely, sqlDF.show() at line 123 only creates 1 job.

      I don't know what happened at show() at line 60.

      Attachments

        Activity

          People

            Unassigned Unassigned
            spark_cachecheck IcySanwitch
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: