[SPARK-25377] spark sql dataframe cache is invalid - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Incomplete
Affects Version/s: 2.3.0
Fix Version/s: None
Component/s: Spark Core
Labels:
- bulk-closed
Environment:

spark version 2.3.0

scala version 2.1.8

Description

When I use SQL dataframe in application, I found that dataframe.cache is invalid, the first time to execute Action like count() took me 40 seconds, and the seconds time to execute Action also.So I use dataframe.rdd.cache, second execution time is less than first execution time. And I think it's SQL dataframe's bug.

This is my codes and console log, and I have cached the datafame of result before.

this is my codes

logger.info("start to consuming result count")
logger.info(s"consuming ${result.count} output records")
//result.show(false)
logger.info("starting go to MysqlSink")
logger.info(s"consuming ${result.count} output records")
logger.info("starting go to MysqlSink")

And console log is below

18/09/08 14:15:17 INFO MySQLRiskScenarioRunner: start to consuming result count
18/09/08 14:15:49 INFO MySQLRiskScenarioRunner: consuming 5 output records
18/09/08 14:15:49 INFO MySQLRiskScenarioRunner: starting go to MysqlSink
18/09/08 14:16:22 INFO MySQLRiskScenarioRunner: consuming 5 output records
18/09/08 14:16:22 INFO MySQLRiskScenarioRunner: starting go to MysqlSink

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Iverson Hu

Votes:: 1 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 08/Sep/18 06:19

Updated:: 12/Dec/22 18:11

Resolved:: 08/Oct/19 05:44