Hive
  1. Hive
  2. HIVE-3699

Multiple insert overwrite into multiple tables query stores same results in all tables

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.11.0
    • Component/s: Query Processor
    • Labels:
      None
    • Environment:

      Cloudera 4.1 on Amazon Linux (rebranded Centos 6): hive-0.9.0+150-1.cdh4.1.1.p0.4.el6.noarch

    • Hadoop Flags:
      Reviewed

      Description

      (Note: This might be related to HIVE-2750)

      I am doing a query with multiple INSERT OVERWRITE to multiple tables in order to scan the dataset only 1 time, and i end up having all these tables with the same content ! It seems the GROUP BY query that returns results is overwriting all the temp tables.

      Weird enough, if i had further GROUP BY queries into additional temp tables, grouped by a different field, then all temp tables, even the ones that would have been wrong content are all correctly populated.

      This is the misbehaving query:

      FROM nikon
      INSERT OVERWRITE TABLE e1
      SELECT qs_cs_s_aid AS Emplacements, COUNT AS Impressions
      WHERE qs_cs_s_cat='PRINT' GROUP BY qs_cs_s_aid
      INSERT OVERWRITE TABLE e2
      SELECT qs_cs_s_aid AS Emplacements, COUNT AS Vues
      WHERE qs_cs_s_cat='VIEW' GROUP BY qs_cs_s_aid
      ;

      It launches only one MR job and here are the results. Why does table 'e1' contains results from table 'e2' ?! Table 'e1' should have been empty (see individual SELECTs further below)

      hive> SELECT * from e1;
      OK
      NULL 2
      1627575 25
      1627576 70
      1690950 22
      1690952 42
      1696705 199
      1696706 66
      1696730 229
      1696759 85
      1696893 218
      Time taken: 0.229 seconds

      hive> SELECT * from e2;
      OK
      NULL 2
      1627575 25
      1627576 70
      1690950 22
      1690952 42
      1696705 199
      1696706 66
      1696730 229
      1696759 85
      1696893 218
      Time taken: 0.11 seconds

      Here is are the result to the indiviual queries (only the second query returns a result set):

      hive> SELECT qs_cs_s_aid AS Emplacements, COUNT AS Impressions FROM nikon
      WHERE qs_cs_s_cat='PRINT' GROUP BY qs_cs_s_aid;
      (...)
      OK
      <- There are no results, this is normal
      Time taken: 41.471 seconds

      hive> SELECT qs_cs_s_aid AS Emplacements, COUNT AS Vues FROM nikon
      WHERE qs_cs_s_cat='VIEW' GROUP BY qs_cs_s_aid;
      (...)
      OK
      NULL 2
      1627575 25
      1627576 70
      1690950 22
      1690952 42
      1696705 199
      1696706 66
      1696730 229
      1696759 85
      1696893 218
      Time taken: 39.607 seconds

      1. HIVE-3699.D7743.3.patch
        84 kB
        Phabricator
      2. HIVE-3699.D7743.2.patch
        25 kB
        Phabricator
      3. HIVE-3699.D7743.1.patch
        21 kB
        Phabricator
      4. HIVE-3699_hive-0.9.1.patch.txt
        23 kB
        Navis

        Issue Links

          Activity

          No work has yet been logged on this issue.

            People

            • Assignee:
              Navis
              Reporter:
              Alexandre Fouché
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development