Details
-
Bug
-
Status: Closed
-
Blocker
-
Resolution: Won't Do
-
v3.1.2
-
None
-
None
Description
SQL(SUM, COUNT):
SELECT
SUM(t1.a1),
COUNT(1)
FROM
T1 JOIN T2 ON...
JOIN T3 ON...
JOIN T4 ON...
...
JOIN T9 ON...
WHERE
T1.c1 = '10000'
T1.date between '2022-06-11' and '2022-06-21'
T9.b_type IN ('7', '11', '12');
Result:
sum | count | |
---|---|---|
Hive | 2134980.9451 | 36330 |
Kylin | 1135892.3346 | 19765 |
If remove T9 Filter:
SELECT
SUM(t1.a1),
COUNT(1)
FROM
T1 JOIN T2 ON...
JOIN T3 ON...
JOIN T4 ON...
...
JOIN T9 ON...
WHERE
T1.c1 = '10000'
T1.date between '2022-06-11' and '2022-06-21';
Result:
sum | count | |
---|---|---|
Hive | 3184089.5551 | 65333 |
Kylin | 3184089.5551 | 65333 |
理论上,Hive和kylin的结果一致,但是不加上T9表的过滤条件,结果一致,加上Filter,结果丢失;
In theory, the results of Hive and kylin are the same, but the filter conditions of the T9 table are not added, the results are the same, and the results are lost when Filter is added;
env:
Hive,
一共九张表,主表Fact Table是分区表,其余八张表中,两个千万大表,剩下的是维表,表类型是分桶表
There are nine tables. The main table, Fact Table, is a partition table. The other eight tables, there are two large tables. The rest are dimension tables , bucket tables.
Kylin:
Create Intermediate Flat Hive Table
Redistribute Flat Hive Table
Extract Fact Table Distinct Columns(Map Input)
Segment:
Source Count: ???
From log, the same data count