Hive
  1. Hive
  2. HIVE-3699

Multiple insert overwrite into multiple tables query stores same results in all tables

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.11.0
    • Component/s: Query Processor
    • Labels:
      None
    • Environment:

      Cloudera 4.1 on Amazon Linux (rebranded Centos 6): hive-0.9.0+150-1.cdh4.1.1.p0.4.el6.noarch

    • Hadoop Flags:
      Reviewed

      Description

      (Note: This might be related to HIVE-2750)

      I am doing a query with multiple INSERT OVERWRITE to multiple tables in order to scan the dataset only 1 time, and i end up having all these tables with the same content ! It seems the GROUP BY query that returns results is overwriting all the temp tables.

      Weird enough, if i had further GROUP BY queries into additional temp tables, grouped by a different field, then all temp tables, even the ones that would have been wrong content are all correctly populated.

      This is the misbehaving query:

      FROM nikon
      INSERT OVERWRITE TABLE e1
      SELECT qs_cs_s_aid AS Emplacements, COUNT AS Impressions
      WHERE qs_cs_s_cat='PRINT' GROUP BY qs_cs_s_aid
      INSERT OVERWRITE TABLE e2
      SELECT qs_cs_s_aid AS Emplacements, COUNT AS Vues
      WHERE qs_cs_s_cat='VIEW' GROUP BY qs_cs_s_aid
      ;

      It launches only one MR job and here are the results. Why does table 'e1' contains results from table 'e2' ?! Table 'e1' should have been empty (see individual SELECTs further below)

      hive> SELECT * from e1;
      OK
      NULL 2
      1627575 25
      1627576 70
      1690950 22
      1690952 42
      1696705 199
      1696706 66
      1696730 229
      1696759 85
      1696893 218
      Time taken: 0.229 seconds

      hive> SELECT * from e2;
      OK
      NULL 2
      1627575 25
      1627576 70
      1690950 22
      1690952 42
      1696705 199
      1696706 66
      1696730 229
      1696759 85
      1696893 218
      Time taken: 0.11 seconds

      Here is are the result to the indiviual queries (only the second query returns a result set):

      hive> SELECT qs_cs_s_aid AS Emplacements, COUNT AS Impressions FROM nikon
      WHERE qs_cs_s_cat='PRINT' GROUP BY qs_cs_s_aid;
      (...)
      OK
      <- There are no results, this is normal
      Time taken: 41.471 seconds

      hive> SELECT qs_cs_s_aid AS Emplacements, COUNT AS Vues FROM nikon
      WHERE qs_cs_s_cat='VIEW' GROUP BY qs_cs_s_aid;
      (...)
      OK
      NULL 2
      1627575 25
      1627576 70
      1690950 22
      1690952 42
      1696705 199
      1696706 66
      1696730 229
      1696759 85
      1696893 218
      Time taken: 39.607 seconds

      1. HIVE-3699.D7743.1.patch
        21 kB
        Phabricator
      2. HIVE-3699.D7743.2.patch
        25 kB
        Phabricator
      3. HIVE-3699_hive-0.9.1.patch.txt
        23 kB
        Navis
      4. HIVE-3699.D7743.3.patch
        84 kB
        Phabricator

        Issue Links

          Activity

          Hide
          Shanzhong Zhu added a comment -

          Any updates on this item?

          We are also observing a similar issue.

          In the following query, the result of test17 was supposed to be empty. But test17 seems to have the same results as test18.

          FROM (
          SELECT info.product, info.sid, info.id, t.persona, info.service
          FROM info_table info JOIN main_tbl t ON info.service=t.service
          WHERE (info.id BETWEEN 17 AND 18) AND t.dt='2012-11-20' AND t.m='XXX1' AND t.g = 'XXX2' AND t.s = 'XXX3' ) u
          INSERT OVERWRITE TABLE test18 PARTITION (dt='2012-11-20', service)
          SELECT u.product, u.sid, u.id, u.persona, u.service
          WHERE u.id=18
          INSERT OVERWRITE TABLE test17 PARTITION (dt='2012-11-20', service)
          SELECT u.product, u.sid, u.id, u.persona, u.service
          WHERE u.id=17;

          Show
          Shanzhong Zhu added a comment - Any updates on this item? We are also observing a similar issue. In the following query, the result of test17 was supposed to be empty. But test17 seems to have the same results as test18. FROM ( SELECT info.product, info.sid, info.id, t.persona, info.service FROM info_table info JOIN main_tbl t ON info.service=t.service WHERE (info.id BETWEEN 17 AND 18) AND t.dt='2012-11-20' AND t.m='XXX1' AND t.g = 'XXX2' AND t.s = 'XXX3' ) u INSERT OVERWRITE TABLE test18 PARTITION (dt='2012-11-20', service) SELECT u.product, u.sid, u.id, u.persona, u.service WHERE u.id=18 INSERT OVERWRITE TABLE test17 PARTITION (dt='2012-11-20', service) SELECT u.product, u.sid, u.id, u.persona, u.service WHERE u.id=17;
          Hide
          Phabricator added a comment -

          navis requested code review of "HIVE-3699 [jira] Multiple insert overwrite into multiple tables query stores same results in all tables".
          Reviewers: JIRA

          DPAL-1952 Multiple insert overwrite into multiple tables query stores same results in all tables

          (Note: This might be related to HIVE-2750)

          I am doing a query with multiple INSERT OVERWRITE to multiple tables in order to scan the dataset only 1 time, and i end up having all these tables with the same content ! It seems the GROUP BY query that returns results is overwriting all the temp tables.

          Weird enough, if i had further GROUP BY queries into additional temp tables, grouped by a different field, then all temp tables, even the ones that would have been wrong content are all correctly populated.

          This is the misbehaving query:

          FROM nikon
          INSERT OVERWRITE TABLE e1
          SELECT qs_cs_s_aid AS Emplacements, COUNT AS Impressions
          WHERE qs_cs_s_cat='PRINT' GROUP BY qs_cs_s_aid
          INSERT OVERWRITE TABLE e2
          SELECT qs_cs_s_aid AS Emplacements, COUNT AS Vues
          WHERE qs_cs_s_cat='VIEW' GROUP BY qs_cs_s_aid
          ;

          It launches only one MR job and here are the results. Why does table 'e1' contains results from table 'e2' ?! Table 'e1' should have been empty (see individual SELECTs further below)

          hive> SELECT * from e1;
          OK
          NULL 2
          1627575 25
          1627576 70
          1690950 22
          1690952 42
          1696705 199
          1696706 66
          1696730 229
          1696759 85
          1696893 218
          Time taken: 0.229 seconds

          hive> SELECT * from e2;
          OK
          NULL 2
          1627575 25
          1627576 70
          1690950 22
          1690952 42
          1696705 199
          1696706 66
          1696730 229
          1696759 85
          1696893 218
          Time taken: 0.11 seconds

          Here is are the result to the indiviual queries (only the second query returns a result set):

          hive> SELECT qs_cs_s_aid AS Emplacements, COUNT AS Impressions FROM nikon
          WHERE qs_cs_s_cat='PRINT' GROUP BY qs_cs_s_aid;
          (...)
          OK
          <- There are no results, this is normal
          Time taken: 41.471 seconds

          hive> SELECT qs_cs_s_aid AS Emplacements, COUNT AS Vues FROM nikon
          WHERE qs_cs_s_cat='VIEW' GROUP BY qs_cs_s_aid;
          (...)
          OK
          NULL 2
          1627575 25
          1627576 70
          1690950 22
          1690952 42
          1696705 199
          1696706 66
          1696730 229
          1696759 85
          1696893 218
          Time taken: 39.607 seconds

          TEST PLAN
          EMPTY

          REVISION DETAIL
          https://reviews.facebook.net/D7743

          AFFECTED FILES
          ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java
          ql/src/test/queries/clientpositive/multi_insert_gby.q
          ql/src/test/results/clientpositive/multi_insert_gby.q.out

          MANAGE HERALD DIFFERENTIAL RULES
          https://reviews.facebook.net/herald/view/differential/

          WHY DID I GET THIS EMAIL?
          https://reviews.facebook.net/herald/transcript/18627/

          To: JIRA, navis

          Show
          Phabricator added a comment - navis requested code review of " HIVE-3699 [jira] Multiple insert overwrite into multiple tables query stores same results in all tables". Reviewers: JIRA DPAL-1952 Multiple insert overwrite into multiple tables query stores same results in all tables (Note: This might be related to HIVE-2750 ) I am doing a query with multiple INSERT OVERWRITE to multiple tables in order to scan the dataset only 1 time, and i end up having all these tables with the same content ! It seems the GROUP BY query that returns results is overwriting all the temp tables. Weird enough, if i had further GROUP BY queries into additional temp tables, grouped by a different field, then all temp tables, even the ones that would have been wrong content are all correctly populated. This is the misbehaving query: FROM nikon INSERT OVERWRITE TABLE e1 SELECT qs_cs_s_aid AS Emplacements, COUNT AS Impressions WHERE qs_cs_s_cat='PRINT' GROUP BY qs_cs_s_aid INSERT OVERWRITE TABLE e2 SELECT qs_cs_s_aid AS Emplacements, COUNT AS Vues WHERE qs_cs_s_cat='VIEW' GROUP BY qs_cs_s_aid ; It launches only one MR job and here are the results. Why does table 'e1' contains results from table 'e2' ?! Table 'e1' should have been empty (see individual SELECTs further below) hive> SELECT * from e1; OK NULL 2 1627575 25 1627576 70 1690950 22 1690952 42 1696705 199 1696706 66 1696730 229 1696759 85 1696893 218 Time taken: 0.229 seconds hive> SELECT * from e2; OK NULL 2 1627575 25 1627576 70 1690950 22 1690952 42 1696705 199 1696706 66 1696730 229 1696759 85 1696893 218 Time taken: 0.11 seconds Here is are the result to the indiviual queries (only the second query returns a result set): hive> SELECT qs_cs_s_aid AS Emplacements, COUNT AS Impressions FROM nikon WHERE qs_cs_s_cat='PRINT' GROUP BY qs_cs_s_aid; (...) OK <- There are no results, this is normal Time taken: 41.471 seconds hive> SELECT qs_cs_s_aid AS Emplacements, COUNT AS Vues FROM nikon WHERE qs_cs_s_cat='VIEW' GROUP BY qs_cs_s_aid; (...) OK NULL 2 1627575 25 1627576 70 1690950 22 1690952 42 1696705 199 1696706 66 1696730 229 1696759 85 1696893 218 Time taken: 39.607 seconds TEST PLAN EMPTY REVISION DETAIL https://reviews.facebook.net/D7743 AFFECTED FILES ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java ql/src/test/queries/clientpositive/multi_insert_gby.q ql/src/test/results/clientpositive/multi_insert_gby.q.out MANAGE HERALD DIFFERENTIAL RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/18627/ To: JIRA, navis
          Hide
          Navis added a comment -

          small bug in PPD

          for workaround, set hive.optimize.ppd=false;

          Show
          Navis added a comment - small bug in PPD for workaround, set hive.optimize.ppd=false;
          Hide
          Namit Jain added a comment -

          comments on phabricator

          Show
          Namit Jain added a comment - comments on phabricator
          Hide
          Phabricator added a comment -

          njain has commented on the revision "HIVE-3699 [jira] Multiple insert overwrite into multiple tables query stores same results in all tables".

          INLINE COMMENTS
          ql/src/test/queries/clientpositive/multi_insert_gby.q:17 The tests are not deterministic
          ql/src/test/queries/clientpositive/multi_insert_gby.q:32 same as above

          REVISION DETAIL
          https://reviews.facebook.net/D7743

          To: JIRA, navis
          Cc: njain

          Show
          Phabricator added a comment - njain has commented on the revision " HIVE-3699 [jira] Multiple insert overwrite into multiple tables query stores same results in all tables". INLINE COMMENTS ql/src/test/queries/clientpositive/multi_insert_gby.q:17 The tests are not deterministic ql/src/test/queries/clientpositive/multi_insert_gby.q:32 same as above REVISION DETAIL https://reviews.facebook.net/D7743 To: JIRA, navis Cc: njain
          Hide
          Phabricator added a comment -

          navis updated the revision "HIVE-3699 [jira] Multiple insert overwrite into multiple tables query stores same results in all tables".
          Reviewers: JIRA

          Addressed comment

          REVISION DETAIL
          https://reviews.facebook.net/D7743

          AFFECTED FILES
          ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java
          ql/src/test/queries/clientpositive/multi_insert_gby.q
          ql/src/test/results/clientpositive/multi_insert_gby.q.out

          To: JIRA, navis
          Cc: njain

          Show
          Phabricator added a comment - navis updated the revision " HIVE-3699 [jira] Multiple insert overwrite into multiple tables query stores same results in all tables". Reviewers: JIRA Addressed comment REVISION DETAIL https://reviews.facebook.net/D7743 AFFECTED FILES ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java ql/src/test/queries/clientpositive/multi_insert_gby.q ql/src/test/results/clientpositive/multi_insert_gby.q.out To: JIRA, navis Cc: njain
          Hide
          Shanzhong Zhu added a comment -

          Thanks for the patch.

          But I got Hunk failed when applying to Hive 0.9.0.

          patch -p0 < ../../HIVE-3699.D7743.1.patch
          patching file ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java
          Hunk #1 succeeded at 33 (offset 1 line).
          Hunk #2 FAILED at 53.
          Hunk #3 FAILED at 584.
          Hunk #4 succeeded at 617 (offset -4 lines).
          2 out of 4 hunks FAILED – saving rejects to file ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java.rej
          patching file ql/src/test/queries/clientpositive/multi_insert_gby.q
          patching file ql/src/test/results/clientpositive/multi_insert_gby.q.out

          Show
          Shanzhong Zhu added a comment - Thanks for the patch. But I got Hunk failed when applying to Hive 0.9.0. patch -p0 < ../../ HIVE-3699 .D7743.1.patch patching file ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java Hunk #1 succeeded at 33 (offset 1 line). Hunk #2 FAILED at 53. Hunk #3 FAILED at 584. Hunk #4 succeeded at 617 (offset -4 lines). 2 out of 4 hunks FAILED – saving rejects to file ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java.rej patching file ql/src/test/queries/clientpositive/multi_insert_gby.q patching file ql/src/test/results/clientpositive/multi_insert_gby.q.out
          Hide
          Navis added a comment -

          for 0.9 branch

          Show
          Navis added a comment - for 0.9 branch
          Hide
          Phabricator added a comment -

          njain has accepted the revision "HIVE-3699 [jira] Multiple insert overwrite into multiple tables query stores same results in all tables".

          Great catch Navis

          REVISION DETAIL
          https://reviews.facebook.net/D7743

          BRANCH
          DPAL-1952

          To: JIRA, njain, navis
          Cc: njain

          Show
          Phabricator added a comment - njain has accepted the revision " HIVE-3699 [jira] Multiple insert overwrite into multiple tables query stores same results in all tables". Great catch Navis REVISION DETAIL https://reviews.facebook.net/D7743 BRANCH DPAL-1952 To: JIRA, njain, navis Cc: njain
          Hide
          Namit Jain added a comment -

          A lot of tests are failing - can you debug ?

          Show
          Namit Jain added a comment - A lot of tests are failing - can you debug ?
          Hide
          Phabricator added a comment -

          navis updated the revision "HIVE-3699 [jira] Multiple insert overwrite into multiple tables query stores same results in all tables".
          Reviewers: JIRA, njain

          Updated result of multi_insert.q,multi_insert_move_tasks_share_dependencies.q

          REVISION DETAIL
          https://reviews.facebook.net/D7743

          AFFECTED FILES
          ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java
          ql/src/test/queries/clientpositive/multi_insert_gby.q
          ql/src/test/results/clientpositive/groupby_multi_single_reducer2.q.out
          ql/src/test/results/clientpositive/multi_insert.q.out
          ql/src/test/results/clientpositive/multi_insert_gby.q.out
          ql/src/test/results/clientpositive/multi_insert_move_tasks_share_dependencies.q.out

          To: JIRA, njain, navis
          Cc: njain

          Show
          Phabricator added a comment - navis updated the revision " HIVE-3699 [jira] Multiple insert overwrite into multiple tables query stores same results in all tables". Reviewers: JIRA, njain Updated result of multi_insert.q,multi_insert_move_tasks_share_dependencies.q REVISION DETAIL https://reviews.facebook.net/D7743 AFFECTED FILES ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java ql/src/test/queries/clientpositive/multi_insert_gby.q ql/src/test/results/clientpositive/groupby_multi_single_reducer2.q.out ql/src/test/results/clientpositive/multi_insert.q.out ql/src/test/results/clientpositive/multi_insert_gby.q.out ql/src/test/results/clientpositive/multi_insert_move_tasks_share_dependencies.q.out To: JIRA, njain, navis Cc: njain
          Hide
          Namit Jain added a comment -

          +1

          We had so many checked in tests with wrong results.

          Show
          Namit Jain added a comment - +1 We had so many checked in tests with wrong results.
          Hide
          Namit Jain added a comment -

          Committed. Thanks Navis

          Show
          Namit Jain added a comment - Committed. Thanks Navis
          Hide
          Hudson added a comment -

          Integrated in hive-trunk-hadoop1 #26 (See https://builds.apache.org/job/hive-trunk-hadoop1/26/)
          HIVE-3699 Multiple insert overwrite into multiple tables query stores same results
          in all tables (Navis via namit) (Revision 1435484)

          Result = ABORTED
          namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1435484
          Files :

          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java
          • /hive/trunk/ql/src/test/queries/clientpositive/multi_insert_gby.q
          • /hive/trunk/ql/src/test/results/clientpositive/groupby_multi_single_reducer2.q.out
          • /hive/trunk/ql/src/test/results/clientpositive/multi_insert.q.out
          • /hive/trunk/ql/src/test/results/clientpositive/multi_insert_gby.q.out
          • /hive/trunk/ql/src/test/results/clientpositive/multi_insert_move_tasks_share_dependencies.q.out
          Show
          Hudson added a comment - Integrated in hive-trunk-hadoop1 #26 (See https://builds.apache.org/job/hive-trunk-hadoop1/26/ ) HIVE-3699 Multiple insert overwrite into multiple tables query stores same results in all tables (Navis via namit) (Revision 1435484) Result = ABORTED namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1435484 Files : /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java /hive/trunk/ql/src/test/queries/clientpositive/multi_insert_gby.q /hive/trunk/ql/src/test/results/clientpositive/groupby_multi_single_reducer2.q.out /hive/trunk/ql/src/test/results/clientpositive/multi_insert.q.out /hive/trunk/ql/src/test/results/clientpositive/multi_insert_gby.q.out /hive/trunk/ql/src/test/results/clientpositive/multi_insert_move_tasks_share_dependencies.q.out
          Hide
          Hudson added a comment -

          Integrated in Hive-trunk-hadoop2 #74 (See https://builds.apache.org/job/Hive-trunk-hadoop2/74/)
          HIVE-3699 Multiple insert overwrite into multiple tables query stores same results
          in all tables (Navis via namit) (Revision 1435484)

          Result = FAILURE
          namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1435484
          Files :

          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java
          • /hive/trunk/ql/src/test/queries/clientpositive/multi_insert_gby.q
          • /hive/trunk/ql/src/test/results/clientpositive/groupby_multi_single_reducer2.q.out
          • /hive/trunk/ql/src/test/results/clientpositive/multi_insert.q.out
          • /hive/trunk/ql/src/test/results/clientpositive/multi_insert_gby.q.out
          • /hive/trunk/ql/src/test/results/clientpositive/multi_insert_move_tasks_share_dependencies.q.out
          Show
          Hudson added a comment - Integrated in Hive-trunk-hadoop2 #74 (See https://builds.apache.org/job/Hive-trunk-hadoop2/74/ ) HIVE-3699 Multiple insert overwrite into multiple tables query stores same results in all tables (Navis via namit) (Revision 1435484) Result = FAILURE namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1435484 Files : /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java /hive/trunk/ql/src/test/queries/clientpositive/multi_insert_gby.q /hive/trunk/ql/src/test/results/clientpositive/groupby_multi_single_reducer2.q.out /hive/trunk/ql/src/test/results/clientpositive/multi_insert.q.out /hive/trunk/ql/src/test/results/clientpositive/multi_insert_gby.q.out /hive/trunk/ql/src/test/results/clientpositive/multi_insert_move_tasks_share_dependencies.q.out
          Hide
          Hudson added a comment -

          Integrated in Hive-trunk-h0.21 #1923 (See https://builds.apache.org/job/Hive-trunk-h0.21/1923/)
          HIVE-3699 Multiple insert overwrite into multiple tables query stores same results
          in all tables (Navis via namit) (Revision 1435484)

          Result = SUCCESS
          namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1435484
          Files :

          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java
          • /hive/trunk/ql/src/test/queries/clientpositive/multi_insert_gby.q
          • /hive/trunk/ql/src/test/results/clientpositive/groupby_multi_single_reducer2.q.out
          • /hive/trunk/ql/src/test/results/clientpositive/multi_insert.q.out
          • /hive/trunk/ql/src/test/results/clientpositive/multi_insert_gby.q.out
          • /hive/trunk/ql/src/test/results/clientpositive/multi_insert_move_tasks_share_dependencies.q.out
          Show
          Hudson added a comment - Integrated in Hive-trunk-h0.21 #1923 (See https://builds.apache.org/job/Hive-trunk-h0.21/1923/ ) HIVE-3699 Multiple insert overwrite into multiple tables query stores same results in all tables (Navis via namit) (Revision 1435484) Result = SUCCESS namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1435484 Files : /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java /hive/trunk/ql/src/test/queries/clientpositive/multi_insert_gby.q /hive/trunk/ql/src/test/results/clientpositive/groupby_multi_single_reducer2.q.out /hive/trunk/ql/src/test/results/clientpositive/multi_insert.q.out /hive/trunk/ql/src/test/results/clientpositive/multi_insert_gby.q.out /hive/trunk/ql/src/test/results/clientpositive/multi_insert_move_tasks_share_dependencies.q.out

            People

            • Assignee:
              Navis
              Reporter:
              Alexandre Fouché
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development