Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-6851

Wrong answers for self joins of converted parquet relations

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 1.3.1
    • 1.3.1, 1.4.0
    • SQL

    Description

      From the user list (
      /cc chinnitv) When the same relation exists twice in a query plan, our new caching logic replaces both instances with identical replacements. The bug can be see in the following transformation:

      === Applying Rule org.apache.spark.sql.hive.HiveMetastoreCatalog$ParquetConversions ===
      !Project [state#59,month#60]                                           'Project [state#105,month#106]
      ! Join Inner, Some(((state#69 = state#59) && (month#70 = month#60)))    'Join Inner, Some(((state#105 = state#105) && (month#106 = month#106)))
      !  MetastoreRelation default, orders, None                               Subquery orders
      !  Subquery ao                                                            Relation[id#97,category#98,make#99,type#100,price#101,pdate#102,customer#103,city#104,state#105,month#106] org.apache.spark.sql.parquet.ParquetRelation2
      !   Distinct                                                             Subquery ao
      !    Project [state#69,month#70]                                          Distinct
      !     Join Inner, Some((id#81 = id#71))                                    Project [state#105,month#106]
      !      MetastoreRelation default, orders, None                              Join Inner, Some((id#115 = id#97))
      !      MetastoreRelation default, orderupdates, None                         Subquery orders
      !                                                                             Relation[id#97,category#98,make#99,type#100,price#101,pdate#102,customer#103,city#104,state#105,month#106] org.apache.spark.sql.parquet.ParquetRelation2
      !                                                                            Subquery orderupdates
      !                                                                             Relation[id#115,category#116,make#117,type#118,price#119,pdate#120,customer#121,city#122,state#123,month#124] org.apache.spark.sql.parquet.ParquetRelation2
      

      Attachments

        Activity

          People

            marmbrus Michael Armbrust
            marmbrus Michael Armbrust
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: