Hive
  1. Hive
  2. HIVE-5129

Multiple table insert fails on count(distinct)

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.11.0
    • Fix Version/s: 0.12.0
    • Component/s: Query Processor
    • Labels:
      None

      Description

      Hive fails with a class cast exception on queries of the form:

      from studenttab10k
      insert overwrite table multi_insert_2_1
      select name, avg(age) as avgage
      group by name
      
      insert overwrite table multi_insert_2_2
      select name, age, sum(gpa) as sumgpa
      group by name, age
      
      insert overwrite table multi_insert_2_3
      select name, count(distinct age) as distage
      group by name;
      
      1. HIVE-5129.4.patch.txt
        61 kB
        Vikram Dixit K
      2. HIVE-5129.4.patch
        61 kB
        Vikram Dixit K
      3. HIVE-5129.3.patch.txt
        41 kB
        Vikram Dixit K
      4. HIVE-5129.2.WIP.patch.txt
        2 kB
        Vikram Dixit K
      5. HIVE-5129.1.patch.txt
        1 kB
        Vikram Dixit K
      6. aggrTestMultiInsertData1.txt
        0.1 kB
        Vikram Dixit K
      7. aggrTestMultiInsertData.txt
        0.0 kB
        Vikram Dixit K
      8. aggrTestMultiInsert.q
        1 kB
        Vikram Dixit K

        Issue Links

          Activity

          Hide
          Ashutosh Chauhan added a comment -

          This issue has been fixed and released as part of 0.12 release. If you find further issues, please create a new jira and link it to this one.

          Show
          Ashutosh Chauhan added a comment - This issue has been fixed and released as part of 0.12 release. If you find further issues, please create a new jira and link it to this one.
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in Hive-trunk-h0.21 #2309 (See https://builds.apache.org/job/Hive-trunk-h0.21/2309/)
          HIVE-5129 Multiple table insert fails on count distinct (Vikram Dixit via Harish Butani) (rhbutani: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1519764)

          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
          • /hive/trunk/ql/src/test/queries/clientpositive/multi_insert_gby3.q
          • /hive/trunk/ql/src/test/results/clientpositive/multi_insert_gby3.q.out
          Show
          Hudson added a comment - SUCCESS: Integrated in Hive-trunk-h0.21 #2309 (See https://builds.apache.org/job/Hive-trunk-h0.21/2309/ ) HIVE-5129 Multiple table insert fails on count distinct (Vikram Dixit via Harish Butani) (rhbutani: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1519764 ) /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java /hive/trunk/ql/src/test/queries/clientpositive/multi_insert_gby3.q /hive/trunk/ql/src/test/results/clientpositive/multi_insert_gby3.q.out
          Hide
          Hudson added a comment -

          FAILURE: Integrated in Hive-trunk-hadoop1-ptest #150 (See https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/150/)
          HIVE-5129 Multiple table insert fails on count distinct (Vikram Dixit via Harish Butani) (rhbutani: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1519764)

          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
          • /hive/trunk/ql/src/test/queries/clientpositive/multi_insert_gby3.q
          • /hive/trunk/ql/src/test/results/clientpositive/multi_insert_gby3.q.out
          Show
          Hudson added a comment - FAILURE: Integrated in Hive-trunk-hadoop1-ptest #150 (See https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/150/ ) HIVE-5129 Multiple table insert fails on count distinct (Vikram Dixit via Harish Butani) (rhbutani: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1519764 ) /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java /hive/trunk/ql/src/test/queries/clientpositive/multi_insert_gby3.q /hive/trunk/ql/src/test/results/clientpositive/multi_insert_gby3.q.out
          Hide
          Hudson added a comment -

          FAILURE: Integrated in Hive-trunk-hadoop2-ptest #83 (See https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/83/)
          HIVE-5129 Multiple table insert fails on count distinct (Vikram Dixit via Harish Butani) (rhbutani: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1519764)

          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
          • /hive/trunk/ql/src/test/queries/clientpositive/multi_insert_gby3.q
          • /hive/trunk/ql/src/test/results/clientpositive/multi_insert_gby3.q.out
          Show
          Hudson added a comment - FAILURE: Integrated in Hive-trunk-hadoop2-ptest #83 (See https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/83/ ) HIVE-5129 Multiple table insert fails on count distinct (Vikram Dixit via Harish Butani) (rhbutani: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1519764 ) /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java /hive/trunk/ql/src/test/queries/clientpositive/multi_insert_gby3.q /hive/trunk/ql/src/test/results/clientpositive/multi_insert_gby3.q.out
          Hide
          Hudson added a comment -

          FAILURE: Integrated in Hive-trunk-hadoop2 #401 (See https://builds.apache.org/job/Hive-trunk-hadoop2/401/)
          HIVE-5129 Multiple table insert fails on count distinct (Vikram Dixit via Harish Butani) (rhbutani: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1519764)

          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
          • /hive/trunk/ql/src/test/queries/clientpositive/multi_insert_gby3.q
          • /hive/trunk/ql/src/test/results/clientpositive/multi_insert_gby3.q.out
          Show
          Hudson added a comment - FAILURE: Integrated in Hive-trunk-hadoop2 #401 (See https://builds.apache.org/job/Hive-trunk-hadoop2/401/ ) HIVE-5129 Multiple table insert fails on count distinct (Vikram Dixit via Harish Butani) (rhbutani: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1519764 ) /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java /hive/trunk/ql/src/test/queries/clientpositive/multi_insert_gby3.q /hive/trunk/ql/src/test/results/clientpositive/multi_insert_gby3.q.out
          Hide
          Harish Butani added a comment -

          Committed to trunk. Thanks, Vikram!

          Show
          Harish Butani added a comment - Committed to trunk. Thanks, Vikram!
          Hide
          Hive QA added a comment -

          Overall: +1 all checks pass

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12600910/HIVE-5129.4.patch

          SUCCESS: +1 2903 tests passed

          Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/581/testReport
          Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/581/console

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          

          This message is automatically generated.

          Show
          Hive QA added a comment - Overall : +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12600910/HIVE-5129.4.patch SUCCESS: +1 2903 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/581/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/581/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase This message is automatically generated.
          Hide
          Vikram Dixit K added a comment -

          Updated extension so that unit tests can run.

          Show
          Vikram Dixit K added a comment - Updated extension so that unit tests can run.
          Hide
          Harish Butani added a comment -

          +1
          will commit if tests pass

          Show
          Harish Butani added a comment - +1 will commit if tests pass
          Hide
          Vikram Dixit K added a comment -

          Addressed Harish's comments. Ran related unit tests.

          Show
          Vikram Dixit K added a comment - Addressed Harish's comments. Ran related unit tests.
          Hide
          Harish Butani added a comment -

          Looks good. Can you add a test that has different distincts, which will cause multiple Jobs to be genned.
          We should look at combining such clauses in the future.

          Navis we think your e.g. is a separate issue. What we observe is that the Key OI setup in your e.g. has only 1 field which is the Union Filed for key and value columns.
          For some reason the key column from the group by is removed from the Key OI. So what happens is each spray instance of the original row is being output.
          So can we look at this in a separate jira.

          Show
          Harish Butani added a comment - Looks good. Can you add a test that has different distincts, which will cause multiple Jobs to be genned. We should look at combining such clauses in the future. Navis we think your e.g. is a separate issue. What we observe is that the Key OI setup in your e.g. has only 1 field which is the Union Filed for key and value columns. For some reason the key column from the group by is removed from the Key OI. So what happens is each spray instance of the original row is being output. So can we look at this in a separate jira.
          Hide
          Vikram Dixit K added a comment -

          Navis I looked into this some more and I feel that the issue you have raised is valid but, is independent of this issue. Would it be OK to raise another jira for addressing that issue?

          I have updated the patch to retain as much of the parallelism as possible and added a test case for the same. Please take a look and let me know your feedback.

          Thanks
          Vikram.

          Show
          Vikram Dixit K added a comment - Navis I looked into this some more and I feel that the issue you have raised is valid but, is independent of this issue. Would it be OK to raise another jira for addressing that issue? I have updated the patch to retain as much of the parallelism as possible and added a test case for the same. Please take a look and let me know your feedback. Thanks Vikram.
          Hide
          Navis added a comment -

          It seemed current distinct implementation has flaw.

          select key, count(distinct key) + count(distinct value) from src tablesample (10 ROWS) group by key;
          
          100	1
          val_100	1
          165	1
          val_165	1
          238	1
          val_238	1
          255	1
          val_255	1
          27	1
          val_27	1
          278	1
          val_278	1
          311	1
          val_311	1
          409	1
          val_409	1
          86	1
          val_86	1
          98	1
          val_98	1
          

          this does not make sense.

          Show
          Navis added a comment - It seemed current distinct implementation has flaw. select key, count(distinct key) + count(distinct value) from src tablesample (10 ROWS) group by key; 100 1 val_100 1 165 1 val_165 1 238 1 val_238 1 255 1 val_255 1 27 1 val_27 1 278 1 val_278 1 311 1 val_311 1 409 1 val_409 1 86 1 val_86 1 98 1 val_98 1 this does not make sense.
          Hide
          Vikram Dixit K added a comment -

          Navis I think this patch addresses what you intended. Could you please take a look. I ran some tests on it and it looks ok. I will include your query above in the test and upload once I have your feedback.

          Thanks
          Vikram.

          Show
          Vikram Dixit K added a comment - Navis I think this patch addresses what you intended. Could you please take a look. I ran some tests on it and it looks ok. I will include your query above in the test and upload once I have your feedback. Thanks Vikram.
          Hide
          Navis added a comment -

          mGBY-1RS optimization is really confusing with distinct functions. IMHO, it should not be allowed to mix distinct and non-distinct cases into one group. For example,

          from src tablesample (10 ROWS)
          insert overwrite table src_a select key, count(distinct key) + count(distinct value) group by key;
          

          makes 20 rows for src_a, and,

          from src tablesample (10 ROWS)
          insert overwrite table src_b select key, count(value) group by key, value;
          

          makes 10 rows for src_b, but,

          from src tablesample (10 ROWS)
          insert overwrite table src_a select key, count(distinct key) + count(distinct value) group by key
          insert overwrite table src_b select key, count(value) group by key, value;
          

          makes 20 rows for src_a and src_b, and lastly,

          from src tablesample (10 ROWS)
          insert overwrite table src_b select key, count(value) group by key, value
          insert overwrite table src_a select key, count(distinct key) + count(distinct value) group by key;
          

          is not working as described in this issue. After applying your patch, it succeeded with 10 rows for both.

          Show
          Navis added a comment - mGBY-1RS optimization is really confusing with distinct functions. IMHO, it should not be allowed to mix distinct and non-distinct cases into one group. For example, from src tablesample (10 ROWS) insert overwrite table src_a select key, count(distinct key) + count(distinct value) group by key; makes 20 rows for src_a, and, from src tablesample (10 ROWS) insert overwrite table src_b select key, count(value) group by key, value; makes 10 rows for src_b, but, from src tablesample (10 ROWS) insert overwrite table src_a select key, count(distinct key) + count(distinct value) group by key insert overwrite table src_b select key, count(value) group by key, value; makes 20 rows for src_a and src_b, and lastly, from src tablesample (10 ROWS) insert overwrite table src_b select key, count(value) group by key, value insert overwrite table src_a select key, count(distinct key) + count(distinct value) group by key; is not working as described in this issue. After applying your patch, it succeeded with 10 rows for both.
          Hide
          Vikram Dixit K added a comment -
          Show
          Vikram Dixit K added a comment - Review board request: https://reviews.apache.org/r/13697/
          Hide
          Vikram Dixit K added a comment -

          Navis This issue is related to HIVE-4692. I have attached a patch which retains the same behavior as was before that patch without affecting the test added as part of it. However, I am not sure what else is affected by this down stream. I have started to run the unit tests to make sure I haven't broken other things. It would be great if you could take a look and provide feedback on this. Please let me know if you have a better approach in mind.

          Thanks!

          Show
          Vikram Dixit K added a comment - Navis This issue is related to HIVE-4692 . I have attached a patch which retains the same behavior as was before that patch without affecting the test added as part of it. However, I am not sure what else is affected by this down stream. I have started to run the unit tests to make sure I haven't broken other things. It would be great if you could take a look and provide feedback on this. Please let me know if you have a better approach in mind. Thanks!
          Hide
          Vikram Dixit K added a comment -

          Test and relevant data. Put the data files in the data/files directory in the source folder.

          Show
          Vikram Dixit K added a comment - Test and relevant data. Put the data files in the data/files directory in the source folder.

            People

            • Assignee:
              Vikram Dixit K
              Reporter:
              Vikram Dixit K
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development