Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Duplicate
-
1.2.0, 3.0.0
-
None
-
None
Description
Repro steps:
drop database if exists ax1 cascade; create database ax1; use ax1; CREATE TABLE tmp1 ( v1 string , v2 string , v3 string ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' ; INSERT INTO tmp1 VALUES ('a', 'b', 'c1') , ('a', 'b', 'c2') , ('d', 'e', 'f') , ('g', 'h', 'i') ; CREATE TABLE tmp_grouped_by_one_col ( v1 string , cnt__v2 int , cnt__v3 int ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' ; CREATE TABLE tmp_grouped_by_two_col ( v1 string , v2 string , cnt__v3 int ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' ; CREATE TABLE tmp_grouped_by_all_col ( v1 string , v2 string , v3 string ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' ; FROM tmp1 INSERT INTO tmp_grouped_by_one_col SELECT v1, count(distinct v2), count(distinct v3) GROUP BY v1 INSERT INTO tmp_grouped_by_all_col SELECT v1, v2, v3 GROUP BY v1, v2, v3 ; select 'tmp_grouped_by_one_col',count(*) from tmp_grouped_by_one_col union all select 'tmp_grouped_by_two_col',count(*) from tmp_grouped_by_two_col union all select 'tmp_grouped_by_all_col',count(*) from tmp_grouped_by_all_col; select * from tmp_grouped_by_all_col;
tmp_grouped_by_all_col table should have 4 reocrds but it loads 7 records into the table.
+----------------------------+----------------------------+----------------------------+--+ | tmp_grouped_by_all_col.v1 | tmp_grouped_by_all_col.v2 | tmp_grouped_by_all_col.v3 | +----------------------------+----------------------------+----------------------------+--+ | a | b | b | | a | c1 | c1 | | a | c2 | c2 | | d | e | e | | d | f | f | | g | h | h | | g | i | i | +----------------------------+----------------------------+----------------------------+--+
Attachments
Issue Links
- duplicates
-
HIVE-19690 multi-insert query with multiple GBY, and distinct in only some branches can produce incorrect results
- Closed