Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-43541

Incorrect column resolution on FULL OUTER JOIN with USING

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.3.2, 3.4.0
    • 3.3.3, 3.4.1, 3.5.0
    • SQL
    • None

    Description

      This was tested on Spark 3.3.2 and Spark 3.4.0.

      Causes [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name `aws_dbr_a`.`key` cannot be resolved. Did you mean one of the following? [`key`].; line 4, pos 7
      

      FULL OUTER JOIN with USING and/or the WHERE seems relevant since I can get the query to work with any of these modifications.

      # -- FULL OUTER JOIN
                         WITH
                         aws_dbr_a AS (select key from values ('a') t(key)),
                         gcp_pro_b AS (select key from values ('a') t(key))
                         SELECT aws_dbr_a.key
                         FROM aws_dbr_a
                         FULL OUTER JOIN gcp_pro_b USING (key)
                         WHERE aws_dbr_a.key NOT LIKE 'spark.clusterUsageTags.%';
      [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name `aws_dbr_a`.`key` cannot be resolved. Did you mean one of the following? [`key`].; line 4 pos 7
      # -- INNER JOIN
                         WITH
                         aws_dbr_a AS (select key from values ('a') t(key)),
                         gcp_pro_b AS (select key from values ('a') t(key))
                         SELECT aws_dbr_a.key
                         FROM aws_dbr_a
                         JOIN gcp_pro_b USING (key)
                         WHERE aws_dbr_a.key NOT LIKE 'spark.clusterUsageTags.%';
      +-----+
      | key |
      |-----|
      | a   |
      +-----+
      1 row in set
      Time: 0.507s
      # -- NO Filter
                         WITH
                         aws_dbr_a AS (select key from values ('a') t(key)),
                         gcp_pro_b AS (select key from values ('a') t(key))
                         SELECT aws_dbr_a.key
                         FROM aws_dbr_a
                         FULL OUTER JOIN gcp_pro_b USING (key);
      +-----+
      | key |
      |-----|
      | a   |
      +-----+
      1 row in set
      Time: 1.021s
      # -- ON instead of USING
                         WITH
                         aws_dbr_a AS (select key from values ('a') t(key)),
                         gcp_pro_b AS (select key from values ('a') t(key))
                         SELECT aws_dbr_a.key
                         FROM aws_dbr_a
                         FULL OUTER JOIN gcp_pro_b ON aws_dbr_a.key = gcp_pro_b.key
                         WHERE aws_dbr_a.key NOT LIKE 'spark.clusterUsageTags.%';
      +-----+
      | key |
      |-----|
      | a   |
      +-----+
      1 row in set
      Time: 0.514s
      

      Attachments

        Activity

          People

            maxgekk Max Gekk
            maxgekk Max Gekk
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: