Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
4.5.1
-
None
-
None
-
None
-
- Phoenix 4.5.1
- Hbase 1.1.2
- Hadoop 2.7.1
- JDK 1.7
Description
Table:
COST (
ACCOUNT_ID VARCHAR NOT NULL,
PRECISION TINYINT NOT NULL,
START_DATE TIMESTAMP NOT NULL,
SECONDARY_ACCOUNT_ID VARCHAR NOT NULL,
TAG VARCHAR NOT NULL,
VENDOR_ID VARCHAR NOT NULL,
SERVICE VARCHAR NOT NULL,
....
CONSTRAINT PK PRIMARY KEY (
ACCOUNT_ID,
PRECISION,
START_DATE,
SECONDARY_ACCOUNT_ID,
TAG,
VENDOR_ID,
SERVICE
) COMPRESSION='SNAPPY', SALT_BUCKETS=16;
When querying this table with a GROUP BY clause that contains the 'START_DATE" column, the results returned are incorrectly aggregated. I find multiple rows with the same "START_DATE" which should have been aggregated to a single row. The workaround is to do a TO_CHAR(START_DATE) which causes the query to return the correct results.
e.g:
---------------- select start_date, account_id, sum(quantity) as total, sum(discounted_quantity) as disc_total from COST where start_date >= (to_date('2016-03-01')) and precision = 1 and account_id in ('1234', '5678') group start_date, account_id --RESULT-- START_DATE ACCOUNT_ID TOTAL DISC_TOTAL ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 2016-03-01 16:00:00.0 1234 312 0 .... 2016-03-07 16:00:00.0 1234 312 0 2016-03-08 16:00:00.0 1234 312 0 2016-03-09 16:00:00.0 1234 216 0 2016-02-29 16:00:00.0 5678 194 24 2016-03-01 16:00:00.0 5678 262 0 .... 2016-03-07 16:00:00.0 5678 237 48 2016-03-08 16:00:00.0 5678 178 0 2016-03-09 16:00:00.0 1234 96 0 2016-03-09 16:00:00.0 5678 173 50 2016-03-10 16:00:00.0 1234 324 0 .... 2016-03-07 16:00:00.0 5678 178 24 2016-03-08 16:00:00.0 5678 218 0 2016-03-09 16:00:00.0 5678 218 48 ^^^^^^^^^^^^^^^^^^^^^^ Repeated rows --------------------
Workaround:
select to_char(start_date), account_id, sum(quantity) as total, sum(discounted_quantity) as disc_total from COST where start_date >= (to_date('2016-03-01')) and precision = 1 and account_id in ('1234', '5678') group to_char(start_date), account_id --RESULT-- START_DATE ACCOUNT_ID TOTAL DISC_TOTAL ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 2016-03-01 00:00:00.000 1234 312 0 2016-03-01 00:00:00.000 5678 3,465 384 2016-03-02 00:00:00.000 1234 312 0 2016-03-02 00:00:00.000 5678 3,238 384 2016-03-03 00:00:00.000 1234 312 0 2016-03-03 00:00:00.000 5678 3,243 384 2016-03-04 00:00:00.000 1234 312 0 2016-03-04 00:00:00.000 5678 3,212 384 2016-03-05 00:00:00.000 1234 312 0 2016-03-05 00:00:00.000 5678 2,907 384 2016-03-06 00:00:00.000 1234 312 0 2016-03-06 00:00:00.000 5678 2,907 384 2016-03-07 00:00:00.000 1234 312 0 2016-03-07 00:00:00.000 5678 2,919 384 2016-03-08 00:00:00.000 1234 312 0 2016-03-08 00:00:00.000 5678 2,955 384 2016-03-09 00:00:00.000 1234 312 0 2016-03-09 00:00:00.000 5678 2,955 384 2016-03-10 00:00:00.000 1234 312 0 2016-03-10 00:00:00.000 5678 2,978 384 2016-03-11 00:00:00.000 1234 324 0 2016-03-11 00:00:00.000 5678 3,125 384 2016-03-12 00:00:00.000 1234 336 0 2016-03-12 00:00:00.000 5678 3,102 384 2016-03-13 00:00:00.000 1234 336 0 2016-03-13 00:00:00.000 5678 3,117 384 2016-03-14 00:00:00.000 1234 153 0 2016-03-14 00:00:00.000 5678 2,187 278 -----------------