[HIVE-10841] [WHERE col is not null] does not work sometimes for queries with many JOIN statements - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 0.13.0, 0.13.1, 0.14.0, 1.2.0, 1.3.0
Fix Version/s: 1.0.2, 1.2.1, 1.3.0, 2.0.0
Component/s: Query Planning, Query Processor
Labels:
None

Description

The result from the following SELECT query is 3 rows but it should be 1 row.
I checked it in MySQL - it returned 1 row.

To reproduce the issue in Hive
1. prepare tables

drop table if exists L;
drop table if exists LA;
drop table if exists FR;
drop table if exists A;
drop table if exists PI;
drop table if exists acct;

create table L as select 4436 id;
create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id;
create table FR as select 4436 loan_id;
create table A as select 4748 id;
create table PI as select 4415 id;

create table acct as select 4748 aid, 10 acc_n, 122 brn;
insert into table acct values(4748, null, null);
insert into table acct values(4748, null, null);

2. run SELECT query

select
  acct.ACC_N,
  acct.brn
FROM L
JOIN LA ON L.id = LA.loan_id
JOIN FR ON L.id = FR.loan_id
JOIN A ON LA.aid = A.id
JOIN PI ON PI.id = LA.pi_id
JOIN acct ON A.id = acct.aid
WHERE
  L.id = 4436
  and acct.brn is not null;

the result is 3 rows

10	122
NULL	NULL
NULL	NULL

but it should be 1 row

10	122

2.1 "explain select ..." output for hive-1.3.0 MR

STAGE DEPENDENCIES:
  Stage-12 is a root stage
  Stage-9 depends on stages: Stage-12
  Stage-0 depends on stages: Stage-9

STAGE PLANS:
  Stage: Stage-12
    Map Reduce Local Work
      Alias -> Map Local Tables:
        a 
          Fetch Operator
            limit: -1
        acct 
          Fetch Operator
            limit: -1
        fr 
          Fetch Operator
            limit: -1
        l 
          Fetch Operator
            limit: -1
        pi 
          Fetch Operator
            limit: -1
      Alias -> Map Local Operator Tree:
        a 
          TableScan
            alias: a
            Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: id is not null (type: boolean)
              Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
              HashTable Sink Operator
                keys:
                  0 _col5 (type: int)
                  1 id (type: int)
                  2 aid (type: int)
        acct 
          TableScan
            alias: acct
            Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: aid is not null (type: boolean)
              Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE Column stats: NONE
              HashTable Sink Operator
                keys:
                  0 _col5 (type: int)
                  1 id (type: int)
                  2 aid (type: int)
        fr 
          TableScan
            alias: fr
            Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: (loan_id = 4436) (type: boolean)
              Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
              HashTable Sink Operator
                keys:
                  0 4436 (type: int)
                  1 4436 (type: int)
                  2 4436 (type: int)
        l 
          TableScan
            alias: l
            Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: (id = 4436) (type: boolean)
              Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
              HashTable Sink Operator
                keys:
                  0 4436 (type: int)
                  1 4436 (type: int)
                  2 4436 (type: int)
        pi 
          TableScan
            alias: pi
            Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: id is not null (type: boolean)
              Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
              HashTable Sink Operator
                keys:
                  0 _col6 (type: int)
                  1 id (type: int)

  Stage: Stage-9
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: la
            Statistics: Num rows: 1 Data size: 14 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: (((loan_id is not null and aid is not null) and pi_id is not null) and (loan_id = 4436)) (type: boolean)
              Statistics: Num rows: 1 Data size: 14 Basic stats: COMPLETE Column stats: NONE
              Map Join Operator
                condition map:
                     Inner Join 0 to 1
                     Inner Join 0 to 2
                keys:
                  0 4436 (type: int)
                  1 4436 (type: int)
                  2 4436 (type: int)
                outputColumnNames: _col5, _col6
                Statistics: Num rows: 2 Data size: 8 Basic stats: COMPLETE Column stats: NONE
                Map Join Operator
                  condition map:
                       Inner Join 0 to 1
                       Inner Join 1 to 2
                  keys:
                    0 _col5 (type: int)
                    1 id (type: int)
                    2 aid (type: int)
                  outputColumnNames: _col6, _col19, _col20
                  Statistics: Num rows: 4 Data size: 17 Basic stats: COMPLETE Column stats: NONE
                  Map Join Operator
                    condition map:
                         Inner Join 0 to 1
                    keys:
                      0 _col6 (type: int)
                      1 id (type: int)
                    outputColumnNames: _col19, _col20
                    Statistics: Num rows: 4 Data size: 18 Basic stats: COMPLETE Column stats: NONE
                    Select Operator
                      expressions: _col19 (type: int), _col20 (type: int)
                      outputColumnNames: _col0, _col1
                      Statistics: Num rows: 4 Data size: 18 Basic stats: COMPLETE Column stats: NONE
                      File Output Operator
                        compressed: false
                        Statistics: Num rows: 4 Data size: 18 Basic stats: COMPLETE Column stats: NONE
                        table:
                            input format: org.apache.hadoop.mapred.TextInputFormat
                            output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                            serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      Local Work:
        Map Reduce Local Work

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

Time taken: 0.57 seconds, Fetched: 142 row(s)

2.2. "explain select..." output for hive-0.13.1 Tez

STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-1
    Tez
      Edges:
        Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 4 (SIMPLE_EDGE), Reducer 6 (SIMPLE_EDGE)
        Reducer 3 <- Reducer 2 (SIMPLE_EDGE), Map 9 (SIMPLE_EDGE)
        Reducer 6 <- Map 5 (SIMPLE_EDGE), Map 7 (SIMPLE_EDGE), Map 8 (SIMPLE_EDGE)
      DagName: lcapp_20150528111717_06c57a5b-8dc6-4ce9-bce7-b9e0a7818fe4:1
      Vertices:
        Map 1 
            Map Operator Tree:
                TableScan
                  alias: acct
                  Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
                  Reduce Output Operator
                    key expressions: aid (type: int)
                    sort order: +
                    Map-reduce partition columns: aid (type: int)
                    Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
                    value expressions: acc_n (type: int), brn (type: int)
        Map 4 
            Map Operator Tree:
                TableScan
                  alias: a
                  Statistics: Num rows: 46 Data size: 187 Basic stats: COMPLETE Column stats: NONE
                  Reduce Output Operator
                    key expressions: id (type: int)
                    sort order: +
                    Map-reduce partition columns: id (type: int)
                    Statistics: Num rows: 46 Data size: 187 Basic stats: COMPLETE Column stats: NONE
        Map 5 
            Map Operator Tree:
                TableScan
                  alias: la
                  Statistics: Num rows: 28 Data size: 347 Basic stats: COMPLETE Column stats: NONE
                  Filter Operator
                    predicate: (loan_id = 4436) (type: boolean)
                    Statistics: Num rows: 14 Data size: 173 Basic stats: COMPLETE Column stats: NONE
                    Reduce Output Operator
                      key expressions: loan_id (type: int)
                      sort order: +
                      Map-reduce partition columns: loan_id (type: int)
                      Statistics: Num rows: 14 Data size: 173 Basic stats: COMPLETE Column stats: NONE
                      value expressions: aid (type: int), pi_id (type: int)
        Map 7 
            Map Operator Tree:
                TableScan
                  alias: fr
                  Statistics: Num rows: 46 Data size: 187 Basic stats: COMPLETE Column stats: NONE
                  Filter Operator
                    predicate: (loan_id = 4436) (type: boolean)
                    Statistics: Num rows: 23 Data size: 93 Basic stats: COMPLETE Column stats: NONE
                    Reduce Output Operator
                      key expressions: loan_id (type: int)
                      sort order: +
                      Map-reduce partition columns: loan_id (type: int)
                      Statistics: Num rows: 23 Data size: 93 Basic stats: COMPLETE Column stats: NONE
        Map 8 
            Map Operator Tree:
                TableScan
                  alias: l
                  Statistics: Num rows: 46 Data size: 187 Basic stats: COMPLETE Column stats: NONE
                  Filter Operator
                    predicate: (id = 4436) (type: boolean)
                    Statistics: Num rows: 23 Data size: 93 Basic stats: COMPLETE Column stats: NONE
                    Reduce Output Operator
                      key expressions: id (type: int)
                      sort order: +
                      Map-reduce partition columns: id (type: int)
                      Statistics: Num rows: 23 Data size: 93 Basic stats: COMPLETE Column stats: NONE
        Map 9 
            Map Operator Tree:
                TableScan
                  alias: pi
                  Statistics: Num rows: 46 Data size: 187 Basic stats: COMPLETE Column stats: NONE
                  Reduce Output Operator
                    key expressions: id (type: int)
                    sort order: +
                    Map-reduce partition columns: id (type: int)
                    Statistics: Num rows: 46 Data size: 187 Basic stats: COMPLETE Column stats: NONE
        Reducer 2 
            Reduce Operator Tree:
              Join Operator
                condition map:
                     Inner Join 0 to 1
                     Inner Join 1 to 2
                condition expressions:
                  0 {VALUE._col2}
                  1 
                  2 {VALUE._col1} {VALUE._col2}
                outputColumnNames: _col2, _col15, _col16
                Statistics: Num rows: 110 Data size: 448 Basic stats: COMPLETE Column stats: NONE
                Reduce Output Operator
                  key expressions: _col2 (type: int)
                  sort order: +
                  Map-reduce partition columns: _col2 (type: int)
                  Statistics: Num rows: 110 Data size: 448 Basic stats: COMPLETE Column stats: NONE
                  value expressions: _col15 (type: int), _col16 (type: int)
        Reducer 3 
            Reduce Operator Tree:
              Join Operator
                condition map:
                     Inner Join 0 to 1
                condition expressions:
                  0 {VALUE._col1} {VALUE._col2}
                  1 
                outputColumnNames: _col1, _col2
                Statistics: Num rows: 121 Data size: 492 Basic stats: COMPLETE Column stats: NONE
                Select Operator
                  expressions: _col1 (type: int), _col2 (type: int)
                  outputColumnNames: _col0, _col1
                  Statistics: Num rows: 121 Data size: 492 Basic stats: COMPLETE Column stats: NONE
                  File Output Operator
                    compressed: false
                    Statistics: Num rows: 121 Data size: 492 Basic stats: COMPLETE Column stats: NONE
                    table:
                        input format: org.apache.hadoop.mapred.TextInputFormat
                        output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                        serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
        Reducer 6 
            Reduce Operator Tree:
              Join Operator
                condition map:
                     Inner Join 0 to 1
                     Inner Join 0 to 2
                condition expressions:
                  0 
                  1 {VALUE._col1} {VALUE._col2}
                  2 
                outputColumnNames: _col4, _col5
                Statistics: Num rows: 50 Data size: 204 Basic stats: COMPLETE Column stats: NONE
                Reduce Output Operator
                  key expressions: _col4 (type: int)
                  sort order: +
                  Map-reduce partition columns: _col4 (type: int)
                  Statistics: Num rows: 50 Data size: 204 Basic stats: COMPLETE Column stats: NONE
                  value expressions: _col5 (type: int)

  Stage: Stage-0
    Fetch Operator
      limit: -1

Time taken: 1.377 seconds, Fetched: 146 row(s)

3. The workaround is to put "acct.brn is not null" to join condition

select
  acct.ACC_N,
  acct.brn
FROM L
JOIN LA ON L.id = LA.loan_id
JOIN FR ON L.id = FR.loan_id
JOIN A ON LA.aid = A.id
JOIN PI ON PI.id = LA.pi_id
JOIN acct ON A.id = acct.aid and acct.brn is not null
WHERE
  L.id = 4436;

OK
10	122
Time taken: 23.479 seconds, Fetched: 1 row(s)

I tried it on hive-1.3.0 (MR) and hive-0.13.1 (MR and Tez) - all combinations have the issue

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-10841.03.patch
11/Jun/15 10:52
16 kB
jcamachorodriguez
HIVE-10841.1.patch
06/Jun/15 02:01
3 kB
Laljo John Pullokkaran
HIVE-10841.2.patch
11/Jun/15 04:26
2 kB
Laljo John Pullokkaran
HIVE-10841.patch
02/Jun/15 07:55
0.7 kB
Laljo John Pullokkaran

Issue Links

is duplicated by

HIVE-11034 Joining multiple tables producing different results with different order of join

Resolved

is related to

HIVE-3847 ppd.remove.duplicatefilters removing filters too aggressively

Closed

relates to

HIVE-4293 Predicates following UDTF operator are removed by PPD

Resolved

links to

Activity

People

Assignee:: Laljo John Pullokkaran

Reporter:: Alexander Pivovarov

Votes:: 0 Vote for this issue

Watchers:: 12 Start watching this issue

Dates

Created:: 27/May/15 23:05

Updated:: 27/Feb/24 22:23

Resolved:: 15/Jun/15 19:38