Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Incomplete
-
2.3.0
-
None
-
spark version 2.3.0
scala version 2.1.8
Description
When I use SQL dataframe in application, I found that dataframe.cache is invalid, the first time to execute Action like count() took me 40 seconds, and the seconds time to execute Action also.So I use dataframe.rdd.cache, second execution time is less than first execution time. And I think it's SQL dataframe's bug.
This is my codes and console log, and I have cached the datafame of result before.
this is my codes
logger.info("start to consuming result count")
logger.info(s"consuming ${result.count} output records")
//result.show(false)
logger.info("starting go to MysqlSink")
logger.info(s"consuming ${result.count} output records")
logger.info("starting go to MysqlSink")
And console log is below
18/09/08 14:15:17 INFO MySQLRiskScenarioRunner: start to consuming result count
18/09/08 14:15:49 INFO MySQLRiskScenarioRunner: consuming 5 output records
18/09/08 14:15:49 INFO MySQLRiskScenarioRunner: starting go to MysqlSink
18/09/08 14:16:22 INFO MySQLRiskScenarioRunner: consuming 5 output records
18/09/08 14:16:22 INFO MySQLRiskScenarioRunner: starting go to MysqlSink