Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-25377

spark sql dataframe cache is invalid

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • 2.3.0
    • None
    • Spark Core
    • spark version 2.3.0

      scala version 2.1.8

    Description

        When I use SQL dataframe in application, I found that dataframe.cache is invalid, the first time to execute Action like count() took me 40 seconds, and the seconds time to execute Action also.So I use dataframe.rdd.cache, second execution time is less than first execution time. And I think it's SQL dataframe's bug.

         This is my codes and console log, and I have cached the datafame of result before.

       this is my codes

      logger.info("start to consuming result count")
      logger.info(s"consuming ${result.count} output records")
      //result.show(false)
      logger.info("starting go to MysqlSink")
      logger.info(s"consuming ${result.count} output records")
      logger.info("starting go to MysqlSink")

       

      And console log is below

      18/09/08 14:15:17 INFO MySQLRiskScenarioRunner: start to consuming result count
      18/09/08 14:15:49 INFO MySQLRiskScenarioRunner: consuming 5 output records
      18/09/08 14:15:49 INFO MySQLRiskScenarioRunner: starting go to MysqlSink
      18/09/08 14:16:22 INFO MySQLRiskScenarioRunner: consuming 5 output records
      18/09/08 14:16:22 INFO MySQLRiskScenarioRunner: starting go to MysqlSink

       

       

       

       

       

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            Iverson Iverson Hu
            Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: