Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-6851

Wrong answers for self joins of converted parquet relations

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 1.3.1
    • Fix Version/s: 1.3.1, 1.4.0
    • Component/s: SQL
    • Labels:

      Description

      From the user list (
      /cc Anand Mohan Tumuluri) When the same relation exists twice in a query plan, our new caching logic replaces both instances with identical replacements. The bug can be see in the following transformation:

      === Applying Rule org.apache.spark.sql.hive.HiveMetastoreCatalog$ParquetConversions ===
      !Project [state#59,month#60]                                           'Project [state#105,month#106]
      ! Join Inner, Some(((state#69 = state#59) && (month#70 = month#60)))    'Join Inner, Some(((state#105 = state#105) && (month#106 = month#106)))
      !  MetastoreRelation default, orders, None                               Subquery orders
      !  Subquery ao                                                            Relation[id#97,category#98,make#99,type#100,price#101,pdate#102,customer#103,city#104,state#105,month#106] org.apache.spark.sql.parquet.ParquetRelation2
      !   Distinct                                                             Subquery ao
      !    Project [state#69,month#70]                                          Distinct
      !     Join Inner, Some((id#81 = id#71))                                    Project [state#105,month#106]
      !      MetastoreRelation default, orders, None                              Join Inner, Some((id#115 = id#97))
      !      MetastoreRelation default, orderupdates, None                         Subquery orders
      !                                                                             Relation[id#97,category#98,make#99,type#100,price#101,pdate#102,customer#103,city#104,state#105,month#106] org.apache.spark.sql.parquet.ParquetRelation2
      !                                                                            Subquery orderupdates
      !                                                                             Relation[id#115,category#116,make#117,type#118,price#119,pdate#120,customer#121,city#122,state#123,month#124] org.apache.spark.sql.parquet.ParquetRelation2
      

        Attachments

          Activity

            People

            • Assignee:
              marmbrus Michael Armbrust
              Reporter:
              marmbrus Michael Armbrust
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: