Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
Cloudera 4.1 on Amazon Linux (rebranded Centos 6): hive-0.9.0+150-1.cdh4.1.1.p0.4.el6.noarch
-
Reviewed
Description
(Note: This might be related to HIVE-2750)
I am doing a query with multiple INSERT OVERWRITE to multiple tables in order to scan the dataset only 1 time, and i end up having all these tables with the same content ! It seems the GROUP BY query that returns results is overwriting all the temp tables.
Weird enough, if i had further GROUP BY queries into additional temp tables, grouped by a different field, then all temp tables, even the ones that would have been wrong content are all correctly populated.
This is the misbehaving query:
FROM nikon
INSERT OVERWRITE TABLE e1
SELECT qs_cs_s_aid AS Emplacements, COUNT AS Impressions
WHERE qs_cs_s_cat='PRINT' GROUP BY qs_cs_s_aid
INSERT OVERWRITE TABLE e2
SELECT qs_cs_s_aid AS Emplacements, COUNT AS Vues
WHERE qs_cs_s_cat='VIEW' GROUP BY qs_cs_s_aid
;
It launches only one MR job and here are the results. Why does table 'e1' contains results from table 'e2' ?! Table 'e1' should have been empty (see individual SELECTs further below)
hive> SELECT * from e1;
OK
NULL 2
1627575 25
1627576 70
1690950 22
1690952 42
1696705 199
1696706 66
1696730 229
1696759 85
1696893 218
Time taken: 0.229 seconds
hive> SELECT * from e2;
OK
NULL 2
1627575 25
1627576 70
1690950 22
1690952 42
1696705 199
1696706 66
1696730 229
1696759 85
1696893 218
Time taken: 0.11 seconds
Here is are the result to the indiviual queries (only the second query returns a result set):
hive> SELECT qs_cs_s_aid AS Emplacements, COUNT AS Impressions FROM nikon
WHERE qs_cs_s_cat='PRINT' GROUP BY qs_cs_s_aid;
(...)
OK
<- There are no results, this is normal
Time taken: 41.471 seconds
hive> SELECT qs_cs_s_aid AS Emplacements, COUNT AS Vues FROM nikon
WHERE qs_cs_s_cat='VIEW' GROUP BY qs_cs_s_aid;
(...)
OK
NULL 2
1627575 25
1627576 70
1690950 22
1690952 42
1696705 199
1696706 66
1696730 229
1696759 85
1696893 218
Time taken: 39.607 seconds
Attachments
Attachments
Issue Links
- is duplicated by
-
HIVE-4173 Hive Ignoring where clause for multitable insert
- Resolved