Description
I have a simple cube with a Fact Table and a Lookup Table. The Fact Table
includes some counter. The Lookup Table includes a region property which is
the highest level of the hierarchy and therefore can accumulate quite a lot
of data when summing the counter by region.
I am doing a query like this:
select sum(MY_COUNTER) as tot_count, REGION
from FACT_TABLE
join LOOKUP_TABLE
on FACT_TABLE.FK = LOOKUP_TABLE.PK
where date = '2015-01-01'
group by REGION
I get the following result:
-917,164,421 EMEA --> negative result!
777,795,808 AMER
514,879,134 APAC
117,814,485 LATAM
I ran the exact same query in Hive and got the following result:
3,381,591,228 EMEA --> big number but smaller than the BIGINT limit
778,718,563 AMER --> 922,755 difference with the Kylin result
520,253,610 APAC --> 5,374,476 difference with the Kylin result
117,913,857 LATAM --> 99,372 difference with the Kylin result
Based on that it seems that the limit is the int limit 2^31-1
(2,147,483,647) and that my negative result comes from an overflow.
In my cube I defined this measure as a bigint:
Expression Param Type Param Value Return Type
SUM column AVAIL_COUNT bigint
The other thing that worries me a bit is that the other numbers are not
100% accurate either (>100,000,000 difference!)
===================================================
reported by alex schufo alexschufo@gmail.com