Details
Description
From the user list (
/cc chinnitv) When the same relation exists twice in a query plan, our new caching logic replaces both instances with identical replacements. The bug can be see in the following transformation:
=== Applying Rule org.apache.spark.sql.hive.HiveMetastoreCatalog$ParquetConversions === !Project [state#59,month#60] 'Project [state#105,month#106] ! Join Inner, Some(((state#69 = state#59) && (month#70 = month#60))) 'Join Inner, Some(((state#105 = state#105) && (month#106 = month#106))) ! MetastoreRelation default, orders, None Subquery orders ! Subquery ao Relation[id#97,category#98,make#99,type#100,price#101,pdate#102,customer#103,city#104,state#105,month#106] org.apache.spark.sql.parquet.ParquetRelation2 ! Distinct Subquery ao ! Project [state#69,month#70] Distinct ! Join Inner, Some((id#81 = id#71)) Project [state#105,month#106] ! MetastoreRelation default, orders, None Join Inner, Some((id#115 = id#97)) ! MetastoreRelation default, orderupdates, None Subquery orders ! Relation[id#97,category#98,make#99,type#100,price#101,pdate#102,customer#103,city#104,state#105,month#106] org.apache.spark.sql.parquet.ParquetRelation2 ! Subquery orderupdates ! Relation[id#115,category#116,make#117,type#118,price#119,pdate#120,customer#121,city#122,state#123,month#124] org.apache.spark.sql.parquet.ParquetRelation2