Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
0.13.0, 0.13.1, 0.14.0, 1.2.0, 1.3.0
-
None
Description
The result from the following SELECT query is 3 rows but it should be 1 row.
I checked it in MySQL - it returned 1 row.
To reproduce the issue in Hive
1. prepare tables
drop table if exists L; drop table if exists LA; drop table if exists FR; drop table if exists A; drop table if exists PI; drop table if exists acct; create table L as select 4436 id; create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id; create table FR as select 4436 loan_id; create table A as select 4748 id; create table PI as select 4415 id; create table acct as select 4748 aid, 10 acc_n, 122 brn; insert into table acct values(4748, null, null); insert into table acct values(4748, null, null);
2. run SELECT query
select
acct.ACC_N,
acct.brn
FROM L
JOIN LA ON L.id = LA.loan_id
JOIN FR ON L.id = FR.loan_id
JOIN A ON LA.aid = A.id
JOIN PI ON PI.id = LA.pi_id
JOIN acct ON A.id = acct.aid
WHERE
L.id = 4436
and acct.brn is not null;
the result is 3 rows
10 122 NULL NULL NULL NULL
but it should be 1 row
10 122
2.1 "explain select ..." output for hive-1.3.0 MR
STAGE DEPENDENCIES: Stage-12 is a root stage Stage-9 depends on stages: Stage-12 Stage-0 depends on stages: Stage-9 STAGE PLANS: Stage: Stage-12 Map Reduce Local Work Alias -> Map Local Tables: a Fetch Operator limit: -1 acct Fetch Operator limit: -1 fr Fetch Operator limit: -1 l Fetch Operator limit: -1 pi Fetch Operator limit: -1 Alias -> Map Local Operator Tree: a TableScan alias: a Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: id is not null (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) acct TableScan alias: acct Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: aid is not null (type: boolean) Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) fr TableScan alias: fr Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (loan_id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 4436 (type: int) 1 4436 (type: int) 2 4436 (type: int) l TableScan alias: l Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 4436 (type: int) 1 4436 (type: int) 2 4436 (type: int) pi TableScan alias: pi Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: id is not null (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col6 (type: int) 1 id (type: int) Stage: Stage-9 Map Reduce Map Operator Tree: TableScan alias: la Statistics: Num rows: 1 Data size: 14 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (((loan_id is not null and aid is not null) and pi_id is not null) and (loan_id = 4436)) (type: boolean) Statistics: Num rows: 1 Data size: 14 Basic stats: COMPLETE Column stats: NONE Map Join Operator condition map: Inner Join 0 to 1 Inner Join 0 to 2 keys: 0 4436 (type: int) 1 4436 (type: int) 2 4436 (type: int) outputColumnNames: _col5, _col6 Statistics: Num rows: 2 Data size: 8 Basic stats: COMPLETE Column stats: NONE Map Join Operator condition map: Inner Join 0 to 1 Inner Join 1 to 2 keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) outputColumnNames: _col6, _col19, _col20 Statistics: Num rows: 4 Data size: 17 Basic stats: COMPLETE Column stats: NONE Map Join Operator condition map: Inner Join 0 to 1 keys: 0 _col6 (type: int) 1 id (type: int) outputColumnNames: _col19, _col20 Statistics: Num rows: 4 Data size: 18 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: _col19 (type: int), _col20 (type: int) outputColumnNames: _col0, _col1 Statistics: Num rows: 4 Data size: 18 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 4 Data size: 18 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Local Work: Map Reduce Local Work Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: ListSink Time taken: 0.57 seconds, Fetched: 142 row(s)
2.2. "explain select..." output for hive-0.13.1 Tez
STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Tez Edges: Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 4 (SIMPLE_EDGE), Reducer 6 (SIMPLE_EDGE) Reducer 3 <- Reducer 2 (SIMPLE_EDGE), Map 9 (SIMPLE_EDGE) Reducer 6 <- Map 5 (SIMPLE_EDGE), Map 7 (SIMPLE_EDGE), Map 8 (SIMPLE_EDGE) DagName: lcapp_20150528111717_06c57a5b-8dc6-4ce9-bce7-b9e0a7818fe4:1 Vertices: Map 1 Map Operator Tree: TableScan alias: acct Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: aid (type: int) sort order: + Map-reduce partition columns: aid (type: int) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE value expressions: acc_n (type: int), brn (type: int) Map 4 Map Operator Tree: TableScan alias: a Statistics: Num rows: 46 Data size: 187 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: id (type: int) sort order: + Map-reduce partition columns: id (type: int) Statistics: Num rows: 46 Data size: 187 Basic stats: COMPLETE Column stats: NONE Map 5 Map Operator Tree: TableScan alias: la Statistics: Num rows: 28 Data size: 347 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (loan_id = 4436) (type: boolean) Statistics: Num rows: 14 Data size: 173 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: loan_id (type: int) sort order: + Map-reduce partition columns: loan_id (type: int) Statistics: Num rows: 14 Data size: 173 Basic stats: COMPLETE Column stats: NONE value expressions: aid (type: int), pi_id (type: int) Map 7 Map Operator Tree: TableScan alias: fr Statistics: Num rows: 46 Data size: 187 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (loan_id = 4436) (type: boolean) Statistics: Num rows: 23 Data size: 93 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: loan_id (type: int) sort order: + Map-reduce partition columns: loan_id (type: int) Statistics: Num rows: 23 Data size: 93 Basic stats: COMPLETE Column stats: NONE Map 8 Map Operator Tree: TableScan alias: l Statistics: Num rows: 46 Data size: 187 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (id = 4436) (type: boolean) Statistics: Num rows: 23 Data size: 93 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: id (type: int) sort order: + Map-reduce partition columns: id (type: int) Statistics: Num rows: 23 Data size: 93 Basic stats: COMPLETE Column stats: NONE Map 9 Map Operator Tree: TableScan alias: pi Statistics: Num rows: 46 Data size: 187 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: id (type: int) sort order: + Map-reduce partition columns: id (type: int) Statistics: Num rows: 46 Data size: 187 Basic stats: COMPLETE Column stats: NONE Reducer 2 Reduce Operator Tree: Join Operator condition map: Inner Join 0 to 1 Inner Join 1 to 2 condition expressions: 0 {VALUE._col2} 1 2 {VALUE._col1} {VALUE._col2} outputColumnNames: _col2, _col15, _col16 Statistics: Num rows: 110 Data size: 448 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col2 (type: int) sort order: + Map-reduce partition columns: _col2 (type: int) Statistics: Num rows: 110 Data size: 448 Basic stats: COMPLETE Column stats: NONE value expressions: _col15 (type: int), _col16 (type: int) Reducer 3 Reduce Operator Tree: Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {VALUE._col1} {VALUE._col2} 1 outputColumnNames: _col1, _col2 Statistics: Num rows: 121 Data size: 492 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: _col1 (type: int), _col2 (type: int) outputColumnNames: _col0, _col1 Statistics: Num rows: 121 Data size: 492 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 121 Data size: 492 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Reducer 6 Reduce Operator Tree: Join Operator condition map: Inner Join 0 to 1 Inner Join 0 to 2 condition expressions: 0 1 {VALUE._col1} {VALUE._col2} 2 outputColumnNames: _col4, _col5 Statistics: Num rows: 50 Data size: 204 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col4 (type: int) sort order: + Map-reduce partition columns: _col4 (type: int) Statistics: Num rows: 50 Data size: 204 Basic stats: COMPLETE Column stats: NONE value expressions: _col5 (type: int) Stage: Stage-0 Fetch Operator limit: -1 Time taken: 1.377 seconds, Fetched: 146 row(s)
3. The workaround is to put "acct.brn is not null" to join condition
select
acct.ACC_N,
acct.brn
FROM L
JOIN LA ON L.id = LA.loan_id
JOIN FR ON L.id = FR.loan_id
JOIN A ON LA.aid = A.id
JOIN PI ON PI.id = LA.pi_id
JOIN acct ON A.id = acct.aid and acct.brn is not null
WHERE
L.id = 4436;
OK
10 122
Time taken: 23.479 seconds, Fetched: 1 row(s)
I tried it on hive-1.3.0 (MR) and hive-0.13.1 (MR and Tez) - all combinations have the issue
Attachments
Attachments
Issue Links
- is duplicated by
-
HIVE-11034 Joining multiple tables producing different results with different order of join
- Resolved
- is related to
-
HIVE-3847 ppd.remove.duplicatefilters removing filters too aggressively
- Closed
- relates to
-
HIVE-4293 Predicates following UDTF operator are removed by PPD
- Resolved
- links to