Description
DataFrame with plan overriding sameResult but not using canonicalized plan to compare can't cacheTable.
The example is like:
val localRelation = Seq(1, 2, 3).toDF() localRelation.createOrReplaceTempView("localRelation") spark.catalog.cacheTable("localRelation") assert( localRelation.queryExecution.withCachedData.collect { case i: InMemoryRelation => i }.size == 1)
and this will fail as:
ArrayBuffer() had size 0 instead of expected size 1
The reason is that when do spark.catalog.cacheTable("localRelation"), CacheManager tries to cache for the plan wrapped by SubqueryAlias but when planning for the DataFrame localRelation, CacheManager tries to find cached table for the not-wrapped plan because the plan for DataFrame localRelation is not wrapped.
Some plans like LocalRelation, LogicalRDD, etc. override sameResult method, but not use canonicalized plan to compare so the CacheManager can't detect the plans are the same.