Hive
  1. Hive
  2. HIVE-2567

Some CTAS queries with * and group by don't work.

    Details

    • Type: Bug Bug
    • Status: Patch Available
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    1. HIVE-2567.2.patch
      13 kB
      Robert Surówka
    2. HIVE-2567.1.patch
      16 kB
      Robert Surówka

      Activity

      Hide
      jiraposter@reviews.apache.org added a comment -

      -----------------------------------------------------------
      This is an automatically generated e-mail. To reply, visit:
      https://reviews.apache.org/r/2792/
      -----------------------------------------------------------

      (Updated 2011-11-11 23:47:08.652126)

      Review request for Ning Zhang and namit jain.

      Summary
      -------

      To introduce proper support for the problem would require a lot of work I believe. Yet this patch seems to do the job pretty well, and with very good chance doesn't break anything .

      When user puts tablename.columnname in group by, then the name of the column from * won't be columname but tablename_columnname, to allow ctas with * and joins on tables that share column names (like in the test). In very rare cases it may lead to duplicate column anyway, as shown in the negative test.

      This isn't the final resolution of the problem (as that would require substantial changes), yet at least this allows hive to support many use cases with ctas, * and group by.

      I also run all the tests (with -overwrite option) for this change, the tests executed correctly and no output file for any already existing test had been changed.

      This addresses bug HIVE-2567.
      https://issues.apache.org/jira/browse/HIVE-2567

      Diffs


      trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 1199920
      trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1199920
      trunk/ql/src/test/queries/clientnegative/ctas_group_by_failure1.q PRE-CREATION
      trunk/ql/src/test/queries/clientpositive/input48.q PRE-CREATION
      trunk/ql/src/test/results/clientnegative/ctas_group_by_failure1.q.out PRE-CREATION
      trunk/ql/src/test/results/clientpositive/input48.q.out PRE-CREATION

      Diff: https://reviews.apache.org/r/2792/diff

      Testing
      -------

      Worked on some sample queries.

      Unit tests work too.

      Thanks,

      Robert

      Show
      jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2792/ ----------------------------------------------------------- (Updated 2011-11-11 23:47:08.652126) Review request for Ning Zhang and namit jain. Summary ------- To introduce proper support for the problem would require a lot of work I believe. Yet this patch seems to do the job pretty well, and with very good chance doesn't break anything . When user puts tablename.columnname in group by, then the name of the column from * won't be columname but tablename_columnname, to allow ctas with * and joins on tables that share column names (like in the test). In very rare cases it may lead to duplicate column anyway, as shown in the negative test. This isn't the final resolution of the problem (as that would require substantial changes), yet at least this allows hive to support many use cases with ctas, * and group by. I also run all the tests (with -overwrite option) for this change, the tests executed correctly and no output file for any already existing test had been changed. This addresses bug HIVE-2567 . https://issues.apache.org/jira/browse/HIVE-2567 Diffs trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 1199920 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1199920 trunk/ql/src/test/queries/clientnegative/ctas_group_by_failure1.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/input48.q PRE-CREATION trunk/ql/src/test/results/clientnegative/ctas_group_by_failure1.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/input48.q.out PRE-CREATION Diff: https://reviews.apache.org/r/2792/diff Testing ------- Worked on some sample queries. Unit tests work too. Thanks, Robert
      Hide
      Ning Zhang added a comment -

      Robert, this patch doesn't apply cleanly. Can you svn up and regenerate the patch?

      Show
      Ning Zhang added a comment - Robert, this patch doesn't apply cleanly. Can you svn up and regenerate the patch?
      Hide
      jiraposter@reviews.apache.org added a comment -

      -----------------------------------------------------------
      This is an automatically generated e-mail. To reply, visit:
      https://reviews.apache.org/r/2792/
      -----------------------------------------------------------

      (Updated 2011-11-14 18:06:11.336387)

      Review request for Ning Zhang and namit jain.

      Changes
      -------

      Updated the patch to apply cleanly on current trunk

      Summary
      -------

      To introduce proper support for the problem would require a lot of work I believe. Yet this patch seems to do the job pretty well, and with very good chance doesn't break anything .

      When user puts tablename.columnname in group by, then the name of the column from * won't be columname but tablename_columnname, to allow ctas with * and joins on tables that share column names (like in the test). In very rare cases it may lead to duplicate column anyway, as shown in the negative test.

      This isn't the final resolution of the problem (as that would require substantial changes), yet at least this allows hive to support many use cases with ctas, * and group by.

      I also run all the tests (with -overwrite option) for this change, the tests executed correctly and no output file for any already existing test had been changed.

      This addresses bug HIVE-2567.
      https://issues.apache.org/jira/browse/HIVE-2567

      Diffs (updated)


      trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 1201807
      trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1201807
      trunk/ql/src/test/queries/clientnegative/ctas_group_by_failure1.q PRE-CREATION
      trunk/ql/src/test/queries/clientpositive/input48.q PRE-CREATION
      trunk/ql/src/test/results/clientnegative/ctas_group_by_failure1.q.out PRE-CREATION
      trunk/ql/src/test/results/clientpositive/input48.q.out PRE-CREATION

      Diff: https://reviews.apache.org/r/2792/diff

      Testing
      -------

      Worked on some sample queries.

      Unit tests work too.

      Thanks,

      Robert

      Show
      jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2792/ ----------------------------------------------------------- (Updated 2011-11-14 18:06:11.336387) Review request for Ning Zhang and namit jain. Changes ------- Updated the patch to apply cleanly on current trunk Summary ------- To introduce proper support for the problem would require a lot of work I believe. Yet this patch seems to do the job pretty well, and with very good chance doesn't break anything . When user puts tablename.columnname in group by, then the name of the column from * won't be columname but tablename_columnname, to allow ctas with * and joins on tables that share column names (like in the test). In very rare cases it may lead to duplicate column anyway, as shown in the negative test. This isn't the final resolution of the problem (as that would require substantial changes), yet at least this allows hive to support many use cases with ctas, * and group by. I also run all the tests (with -overwrite option) for this change, the tests executed correctly and no output file for any already existing test had been changed. This addresses bug HIVE-2567 . https://issues.apache.org/jira/browse/HIVE-2567 Diffs (updated) trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 1201807 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1201807 trunk/ql/src/test/queries/clientnegative/ctas_group_by_failure1.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/input48.q PRE-CREATION trunk/ql/src/test/results/clientnegative/ctas_group_by_failure1.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/input48.q.out PRE-CREATION Diff: https://reviews.apache.org/r/2792/diff Testing ------- Worked on some sample queries. Unit tests work too. Thanks, Robert
      Hide
      He Yongqiang added a comment -

      will run tests

      Show
      He Yongqiang added a comment - will run tests

        People

        • Assignee:
          He Yongqiang
          Reporter:
          Robert Surówka
        • Votes:
          0 Vote for this issue
          Watchers:
          0 Start watching this issue

          Dates

          • Created:
            Updated:

            Development