Hive
  1. Hive
  2. HIVE-6393

Support unqualified column references in Joining conditions

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.13.0
    • Component/s: None
    • Labels:
      None

      Description

      Support queries of the form:

      create table r1(a int);
      create table r2(b);
      select a, b
      from r1 join r2 on a = b
      

      This becomes more useful in old style syntax:

      select a, b
      from r1, r2
      where a = b
      
      1. HIVE-6393.3.patch
        154 kB
        Harish Butani
      2. HIVE-6393.2.patch
        152 kB
        Harish Butani
      3. HIVE-6393.1.patch
        150 kB
        Harish Butani

        Activity

        Hide
        Lefty Leverenz added a comment -

        At long last, this is documented at the end of the Join Syntax section in the wiki:

        To match the other version-info box, I included the example in the box instead of putting it at the end of the Examples section. Key words are capitalized and the create statements have semicolons.

        Show
        Lefty Leverenz added a comment - At long last, this is documented at the end of the Join Syntax section in the wiki: LanguageManual Joins – Join Syntax To match the other version-info box, I included the example in the box instead of putting it at the end of the Examples section. Key words are capitalized and the create statements have semicolons.
        Hide
        Harish Butani added a comment -

        I'd say "Hive" instead of "we" and mention the release number (with a link to this jira). Or should it be "the query optimizer" instead of "we"?

        Agreed, I prefer Hive 0.13 over 'the query optimizer'

        This can go in a version-info box at the end of the Join Syntax section

        Agreed.

        so the first example in the wikidoc can't use unqualified column references

        Yes
        A simple e.g. is:

        create table A(k1 string, v1 string)
        create table B(k2 string, v2 string)
        
        select k1, v1, k2, v2
        from A join B on k1 = k2;
        

        I can put this in the wiki after you fine-tune it, unless you'd rather do it yourself.

        Go ahead. Thanks for doing it.

        Show
        Harish Butani added a comment - I'd say "Hive" instead of "we" and mention the release number (with a link to this jira). Or should it be "the query optimizer" instead of "we"? Agreed, I prefer Hive 0.13 over 'the query optimizer' This can go in a version-info box at the end of the Join Syntax section Agreed. so the first example in the wikidoc can't use unqualified column references Yes A simple e.g. is: create table A(k1 string, v1 string) create table B(k2 string, v2 string) select k1, v1, k2, v2 from A join B on k1 = k2; I can put this in the wiki after you fine-tune it, unless you'd rather do it yourself. Go ahead. Thanks for doing it.
        Hide
        Lefty Leverenz added a comment -

        Good, although I'd say "Hive" instead of "we" and mention the release number (with a link to this jira). Or should it be "the query optimizer" instead of "we"?

        This can go in a version-info box at the end of the Join Syntax section, then a simple example can be added just for emphasis at the end of the Examples section. The syntax itself doesn't need any changes.

        Your second sentence says that the column names can't be identical, so the first example in the wikidoc can't use unqualified column references: "SELECT a.* FROM a JOIN b ON (a.id = b.id)". But is it more restrictive than that – neither table can have an (unreferenced) column named the same as the other table's referenced column? Perhaps I'm overthinking this.

        I can put this in the wiki after you fine-tune it, unless you'd rather do it yourself.

        Show
        Lefty Leverenz added a comment - Good, although I'd say "Hive" instead of "we" and mention the release number (with a link to this jira). Or should it be "the query optimizer" instead of "we"? This can go in a version-info box at the end of the Join Syntax section, then a simple example can be added just for emphasis at the end of the Examples section. The syntax itself doesn't need any changes. Your second sentence says that the column names can't be identical, so the first example in the wikidoc can't use unqualified column references: "SELECT a.* FROM a JOIN b ON (a.id = b.id)". But is it more restrictive than that – neither table can have an (unreferenced) column named the same as the other table's referenced column? Perhaps I'm overthinking this. I can put this in the wiki after you fine-tune it, unless you'd rather do it yourself.
        Hide
        Harish Butani added a comment -

        Yes, there should be documentation for this. Here is a shot at it, please revise/edit:

        Unqualified column references are now supported in join conditions. We attempt to resolve these against the inputs to a Join.
        If an unqualified column reference resolves to more than 1 table we will flag this as an ambiguous reference.
        
        Show
        Harish Butani added a comment - Yes, there should be documentation for this. Here is a shot at it, please revise/edit: Unqualified column references are now supported in join conditions. We attempt to resolve these against the inputs to a Join. If an unqualified column reference resolves to more than 1 table we will flag this as an ambiguous reference.
        Hide
        Lefty Leverenz added a comment -

        This needs wiki documentation and perhaps a release note. (Is the fix version 0.13?)

        Here's the Joins doc:

        Show
        Lefty Leverenz added a comment - This needs wiki documentation and perhaps a release note. (Is the fix version 0.13?) Here's the Joins doc: Language Manual: Hive Joins
        Hide
        Harish Butani added a comment -

        ran all tests locally. they passed.

        Show
        Harish Butani added a comment - ran all tests locally. they passed.
        Hide
        Gunther Hagleitner added a comment -

        some comments on rb. otherwise +1.

        Show
        Gunther Hagleitner added a comment - some comments on rb. otherwise +1.
        Show
        Harish Butani added a comment - https://reviews.apache.org/r/18293/
        Hide
        Gunther Hagleitner added a comment -

        Harish Butani could you open a review board request for this one?

        Show
        Gunther Hagleitner added a comment - Harish Butani could you open a review board request for this one?
        Hide
        Hive QA added a comment -

        Overall: -1 at least one tests failed

        Here are the results of testing the latest attachment:
        https://issues.apache.org/jira/secure/attachment/12630532/HIVE-6393.2.patch

        ERROR: -1 due to 2 failed/errored test(s), 5181 tests executed
        Failed tests:

        org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket5
        org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketmapjoin6
        

        Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1478/testReport
        Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1478/console

        Messages:

        Executing org.apache.hive.ptest.execution.PrepPhase
        Executing org.apache.hive.ptest.execution.ExecutionPhase
        Executing org.apache.hive.ptest.execution.ReportingPhase
        Tests exited with: TestsFailedException: 2 tests failed
        

        This message is automatically generated.

        ATTACHMENT ID: 12630532

        Show
        Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12630532/HIVE-6393.2.patch ERROR: -1 due to 2 failed/errored test(s), 5181 tests executed Failed tests: org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket5 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketmapjoin6 Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1478/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1478/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed This message is automatically generated. ATTACHMENT ID: 12630532
        Hide
        Hive QA added a comment -

        Overall: -1 at least one tests failed

        Here are the results of testing the latest attachment:
        https://issues.apache.org/jira/secure/attachment/12629966/HIVE-6393.1.patch

        ERROR: -1 due to 5 failed/errored test(s), 5180 tests executed
        Failed tests:

        org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_cond_pushdown_unqual1
        org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_cond_pushdown_unqual2
        org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_cond_pushdown_unqual3
        org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_cond_pushdown_unqual4
        org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_unqualcolumnrefs
        

        Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1438/testReport
        Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1438/console

        Messages:

        Executing org.apache.hive.ptest.execution.PrepPhase
        Executing org.apache.hive.ptest.execution.ExecutionPhase
        Executing org.apache.hive.ptest.execution.ReportingPhase
        Tests exited with: TestsFailedException: 5 tests failed
        

        This message is automatically generated.

        ATTACHMENT ID: 12629966

        Show
        Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12629966/HIVE-6393.1.patch ERROR: -1 due to 5 failed/errored test(s), 5180 tests executed Failed tests: org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_cond_pushdown_unqual1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_cond_pushdown_unqual2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_cond_pushdown_unqual3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_cond_pushdown_unqual4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_unqualcolumnrefs Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1438/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1438/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed This message is automatically generated. ATTACHMENT ID: 12629966

          People

          • Assignee:
            Harish Butani
            Reporter:
            Harish Butani
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development