Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-20927

Add cache operator to Unsupported Operations in Structured Streaming

    XMLWordPrintableJSON

Details

    Description

      Just found out that cache is not allowed on streaming datasets.

      cache on streaming datasets leads to the following exception:

      scala> spark.readStream.text("files").cache
      org.apache.spark.sql.AnalysisException: Queries with streaming sources must be executed with writeStream.start();;
      FileSource[files]
        at org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$.org$apache$spark$sql$catalyst$analysis$UnsupportedOperationChecker$$throwError(UnsupportedOperationChecker.scala:297)
        at org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$$anonfun$checkForBatch$1.apply(UnsupportedOperationChecker.scala:36)
        at org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$$anonfun$checkForBatch$1.apply(UnsupportedOperationChecker.scala:34)
        at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)
        at org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$.checkForBatch(UnsupportedOperationChecker.scala:34)
        at org.apache.spark.sql.execution.QueryExecution.assertSupported(QueryExecution.scala:63)
        at org.apache.spark.sql.execution.QueryExecution.withCachedData$lzycompute(QueryExecution.scala:74)
        at org.apache.spark.sql.execution.QueryExecution.withCachedData(QueryExecution.scala:72)
        at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:78)
        at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:78)
        at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:84)
        at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:80)
        at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:89)
        at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:89)
        at org.apache.spark.sql.execution.CacheManager$$anonfun$cacheQuery$1.apply(CacheManager.scala:104)
        at org.apache.spark.sql.execution.CacheManager.writeLock(CacheManager.scala:68)
        at org.apache.spark.sql.execution.CacheManager.cacheQuery(CacheManager.scala:92)
        at org.apache.spark.sql.Dataset.persist(Dataset.scala:2603)
        at org.apache.spark.sql.Dataset.cache(Dataset.scala:2613)
        ... 48 elided
      

      It should be included in Structured Streaming's Unsupported Operations.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              jlaskowski Jacek Laskowski
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: