Hive
  1. Hive
  2. HIVE-494

Select columns by index instead of name

    Details

    • Type: Wish Wish
    • Status: Patch Available
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Clients, Query Processor
    • Labels:

      Description

      SELECT mytable[0], mytable[2] FROM some_table_name mytable;

      ...should return the first and third columns, respectively, from mytable regardless of their column names.

      The need for "names" specifically is kind of silly when they just get translated into numbers anyway.

        Issue Links

          Activity

          Hide
          Phabricator added a comment -

          navis requested code review of "HIVE-494 [jira] Select columns by index instead of name".
          Reviewers: JIRA

          DPAL-732 Select columns by index instead of name

          SELECT mytable<span class="error">[0]</span>, mytable<span class="error">[2]</span> FROM some_table_name mytable;

          ...should return the first and third columns, respectively, from mytable regardless of their column names.

          The need for "names" specifically is kind of silly when they just get translated into numbers anyway.

          TEST PLAN
          EMPTY

          REVISION DETAIL
          https://reviews.facebook.net/D1641

          AFFECTED FILES
          ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g
          ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java
          ql/src/test/queries/clientpositive/select_by_column_index.q
          ql/src/test/results/clientpositive/select_by_column_index.q.out

          MANAGE HERALD DIFFERENTIAL RULES
          https://reviews.facebook.net/herald/view/differential/

          WHY DID I GET THIS EMAIL?
          https://reviews.facebook.net/herald/transcript/3501/

          Tip: use the X-Herald-Rules header to filter Herald messages in your client.

          Show
          Phabricator added a comment - navis requested code review of " HIVE-494 [jira] Select columns by index instead of name". Reviewers: JIRA DPAL-732 Select columns by index instead of name SELECT mytable<span class="error"> [0] </span>, mytable<span class="error"> [2] </span> FROM some_table_name mytable; ...should return the first and third columns, respectively, from mytable regardless of their column names. The need for "names" specifically is kind of silly when they just get translated into numbers anyway. TEST PLAN EMPTY REVISION DETAIL https://reviews.facebook.net/D1641 AFFECTED FILES ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java ql/src/test/queries/clientpositive/select_by_column_index.q ql/src/test/results/clientpositive/select_by_column_index.q.out MANAGE HERALD DIFFERENTIAL RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/3501/ Tip: use the X-Herald-Rules header to filter Herald messages in your client.
          Hide
          Phabricator added a comment -

          navis has commented on the revision "HIVE-494 [jira] Select columns by index instead of name".

          Simple patch, but was very useful for me to implement query generator for random-forest.

          REVISION DETAIL
          https://reviews.facebook.net/D1641

          Show
          Phabricator added a comment - navis has commented on the revision " HIVE-494 [jira] Select columns by index instead of name". Simple patch, but was very useful for me to implement query generator for random-forest. REVISION DETAIL https://reviews.facebook.net/D1641
          Hide
          Phabricator added a comment -

          cwsteinbach has commented on the revision "HIVE-494 [jira] Select columns by index instead of name".

          INLINE COMMENTS
          ql/src/test/queries/clientpositive/select_by_column_index.q:2 Is this syntax part of the SQL standard, or is it an extension specific to Hive?
          ql/src/test/queries/clientpositive/select_by_column_index.q:5 Please add coverage for JOINs, ORDER BY, HAVING, and UNION clauses, and a negative testcase that shows what happens when an invalid index is referenced.

          REVISION DETAIL
          https://reviews.facebook.net/D1641

          Show
          Phabricator added a comment - cwsteinbach has commented on the revision " HIVE-494 [jira] Select columns by index instead of name". INLINE COMMENTS ql/src/test/queries/clientpositive/select_by_column_index.q:2 Is this syntax part of the SQL standard, or is it an extension specific to Hive? ql/src/test/queries/clientpositive/select_by_column_index.q:5 Please add coverage for JOINs, ORDER BY, HAVING, and UNION clauses, and a negative testcase that shows what happens when an invalid index is referenced. REVISION DETAIL https://reviews.facebook.net/D1641
          Hide
          Navis added a comment -

          I've completely forgotten this patch and looking into it newly.

          1. It seemed not a standard SQL and also could be confusional especially with join/unions.
          2. The patch is using RR of input operator but it's clearly not correct.

          Could it be worth to be implemented?

          Show
          Navis added a comment - I've completely forgotten this patch and looking into it newly. 1. It seemed not a standard SQL and also could be confusional especially with join/unions. 2. The patch is using RR of input operator but it's clearly not correct. Could it be worth to be implemented?
          Hide
          Carl Steinbach added a comment -

          It seemed not a standard SQL and also could be confusional especially with join/unions.

          I'm a little worried about this too, mainly because it appears to extend SQL syntax and (speaking personally) I don't think I fully understand the long term impact of a change like this. If it turns out that this syntax is broken, deprecating it is going to be painful for all of the people who start using it.

          Could it be worth to be implemented?

          I think we should pass on this unless it turns out to be part of standard SQL, or we can point to some other DB like MySQL that already implements it. On a related note, HIVE-1947 covers implementing similar syntax for the ORDER BY clause, and it turns out that this is part of standard SQL. It's also interesting to note that ordinal column references in the WHERE clause aren't supported since it would result in ambiguous statements like this:

          SELECT a, b from src
          WHERE 1=1;
          
          Show
          Carl Steinbach added a comment - It seemed not a standard SQL and also could be confusional especially with join/unions. I'm a little worried about this too, mainly because it appears to extend SQL syntax and (speaking personally) I don't think I fully understand the long term impact of a change like this. If it turns out that this syntax is broken, deprecating it is going to be painful for all of the people who start using it. Could it be worth to be implemented? I think we should pass on this unless it turns out to be part of standard SQL, or we can point to some other DB like MySQL that already implements it. On a related note, HIVE-1947 covers implementing similar syntax for the ORDER BY clause, and it turns out that this is part of standard SQL. It's also interesting to note that ordinal column references in the WHERE clause aren't supported since it would result in ambiguous statements like this: SELECT a, b from src WHERE 1=1;
          Hide
          Navis added a comment -

          Agree on carl's opinion.

          Show
          Navis added a comment - Agree on carl's opinion.
          Hide
          Edward Capriolo added a comment -

          Carl Steinbach Navis

          I think we should commit this.

          • it is impossible to name a column 1
          • it is impossible to name a column alias 1

          If order by supports this I do not see group by can't? Do we want to reconsider this?I kinda like the feature.

          Show
          Edward Capriolo added a comment - Carl Steinbach Navis I think we should commit this. it is impossible to name a column 1 it is impossible to name a column alias 1 If order by supports this I do not see group by can't? Do we want to reconsider this?I kinda like the feature.
          Hide
          Xuefu Zhang added a comment -

          Pig supports this, though using $1, $2 syntax, which is useful and convenient in some sense. However, I didn't find it's in standard SQL. One downside of supporting this is that ordering starts to matter now in the select list. If I do "select a, b, c from T", the output is deterministic regardless T's schema (as long it has a, b, and c). On the other hand, if I do "select $1, $2, $3 from T" and if later on the table's schema is changed as (a, b, d, c), then my query will return a different data set.

          So, projecting by numbers is different from "they just get translated into numbers anyway".

          Adding columns is quite common in hadoop data. Of course, one can argue that columns should always add at the end, which doesn't happen that way always.

          Show
          Xuefu Zhang added a comment - Pig supports this, though using $1, $2 syntax, which is useful and convenient in some sense. However, I didn't find it's in standard SQL. One downside of supporting this is that ordering starts to matter now in the select list. If I do "select a, b, c from T", the output is deterministic regardless T's schema (as long it has a, b, and c). On the other hand, if I do "select $1, $2, $3 from T" and if later on the table's schema is changed as (a, b, d, c), then my query will return a different data set. So, projecting by numbers is different from "they just get translated into numbers anyway". Adding columns is quite common in hadoop data. Of course, one can argue that columns should always add at the end, which doesn't happen that way always.
          Hide
          Edward Capriolo added a comment -

          I think any user will realize that '$1' can change. In the end i think hive should be more dynamic somewhat like pig. Imagine something like this:

          create table x stored by dynamichandler;

          select $1 , $2 from x (inputformat=textinputformat, inpath=/x/y/z);

          We are close to this now because Navis added the ability to specify per query table properties.

          What is, or what is not in the SQL spec should not be our metric, we can already do amazing things that SQL can't so I want to keep innovating. As long as something does not produce an ambiguity in the language I see no harm in it.

          Show
          Edward Capriolo added a comment - I think any user will realize that '$1' can change. In the end i think hive should be more dynamic somewhat like pig. Imagine something like this: create table x stored by dynamichandler; select $1 , $2 from x (inputformat=textinputformat, inpath=/x/y/z); We are close to this now because Navis added the ability to specify per query table properties. What is, or what is not in the SQL spec should not be our metric, we can already do amazing things that SQL can't so I want to keep innovating. As long as something does not produce an ambiguity in the language I see no harm in it.
          Hide
          Edward Capriolo added a comment -

          I think we should also support negative numbers to query from the right end like awk's $NF

          Show
          Edward Capriolo added a comment - I think we should also support negative numbers to query from the right end like awk's $NF
          Hide
          Navis added a comment -

          The patch for this was made once and I expect it to be found somewhere in local git (regardless it might be based on hive-0.9 or older). It seemed very convenient especially for SQL generators. I'll look into this, in tomorrow.

          Show
          Navis added a comment - The patch for this was made once and I expect it to be found somewhere in local git (regardless it might be based on hive-0.9 or older). It seemed very convenient especially for SQL generators. I'll look into this, in tomorrow.
          Hide
          Navis added a comment -

          Preliminary patch for test/review

          Show
          Navis added a comment - Preliminary patch for test/review
          Hide
          Phabricator added a comment -

          navis requested code review of "HIVE-494 [jira] Select columns by index instead of name".

          Reviewers: JIRA

          HIVE-494 Select columns by index instead of name

          SELECT mytable[0], mytable[2] FROM some_table_name mytable;

          ...should return the first and third columns, respectively, from mytable regardless of their column names.

          The need for "names" specifically is kind of silly when they just get translated into numbers anyway.

          TEST PLAN
          EMPTY

          REVISION DETAIL
          https://reviews.facebook.net/D12153

          AFFECTED FILES
          ql/src/java/org/apache/hadoop/hive/ql/parse/FromClauseParser.g
          ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g
          ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java
          ql/src/test/queries/clientpositive/select_by_column_index.q
          ql/src/test/results/clientpositive/select_by_column_index.q.out

          MANAGE HERALD RULES
          https://reviews.facebook.net/herald/view/differential/

          WHY DID I GET THIS EMAIL?
          https://reviews.facebook.net/herald/transcript/29043/

          To: JIRA, navis

          Show
          Phabricator added a comment - navis requested code review of " HIVE-494 [jira] Select columns by index instead of name". Reviewers: JIRA HIVE-494 Select columns by index instead of name SELECT mytable [0] , mytable [2] FROM some_table_name mytable; ...should return the first and third columns, respectively, from mytable regardless of their column names. The need for "names" specifically is kind of silly when they just get translated into numbers anyway. TEST PLAN EMPTY REVISION DETAIL https://reviews.facebook.net/D12153 AFFECTED FILES ql/src/java/org/apache/hadoop/hive/ql/parse/FromClauseParser.g ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java ql/src/test/queries/clientpositive/select_by_column_index.q ql/src/test/results/clientpositive/select_by_column_index.q.out MANAGE HERALD RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/29043/ To: JIRA, navis
          Hide
          Hive QA added a comment -

          Overall: -1 at least one tests failed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12597403/HIVE-494.D12153.1.patch

          ERROR: -1 due to 1 failed/errored test(s), 2790 tests executed
          Failed tests:

          org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_reducers_power_two
          

          Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/399/testReport
          Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/399/console

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          Tests failed with: TestsFailedException: 1 tests failed
          

          This message is automatically generated.

          Show
          Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12597403/HIVE-494.D12153.1.patch ERROR: -1 due to 1 failed/errored test(s), 2790 tests executed Failed tests: org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_reducers_power_two Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/399/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/399/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests failed with: TestsFailedException: 1 tests failed This message is automatically generated.

            People

            • Assignee:
              Navis
              Reporter:
              Adam Kramer
            • Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:

                Development