Description
Hi,
I have a static table in Kudu, no inserts/updates or deletes are running on the cluster. The query returns DIFFERENT result when I change the where condition on one of the primary key columns, which is in the group_by list.
The created_date is part of the PK and is type of int.
PK contains subscriber, time, date, identifier and created_date.
I tried to check if the inserted count is equal to the HDFS table, and noticed on one day, that the count differs based on the where criteria!!
[10.197.0.164:21000] > select created_date, count from base.usage_kudu where created_date >= 20180601 group by created_date;
Query: select created_date, count from base.usage_kudu where created_date >= 20180601 group by created_date
Query submitted at: 2018-06-04 21:06:30 (Coordinator: http://ip-10-197-0-164.eu-west-1.compute.internal:25000)
Query progress can be monitored at: http://ip-10-197-0-164.eu-west-1.compute.internal:25000/query_plan?query_id=ce4e92eda5aaa02f:ea07aa4600000000
----------------------+
| created_date | count |
----------------------+
| 20180603 | 12145131 |
| 20180601 | 18076448 | -> 195k MORE!!!
| 20180602 | 13325080 |
| 20180604 | 3788161 |
----------------------+
Fetched 4 row(s) in 0.37s
[10.197.0.164:21000] >
[10.197.0.164:21000] >
[10.197.0.164:21000] > select created_date, count from base.usage_kudu where created_date >= 20180601 group by created_date order by 1;
Query: select created_date, count from base.usage_kudu where created_date >= 20180601 group by created_date order by 1
Query submitted at: 2018-06-04 21:06:55 (Coordinator: http://ip-10-197-0-164.eu-west-1.compute.internal:25000)
Query progress can be monitored at: http://ip-10-197-0-164.eu-west-1.compute.internal:25000/query_plan?query_id=d541a9dda19e28e4:be4a2ca000000000
----------------------+
| created_date | count |
----------------------+
| 20180601 | 18076448 | -> 195k MORE!!!
| 20180602 | 13325080 |
| 20180603 | 12145131 |
| 20180604 | 3788161 |
----------------------+
Fetched 4 row(s) in 1.14s
[10.197.0.164:21000] > select created_date, count from base.usage_kudu where created_date >= 20180528 group by created_date order by 1;
Query: select created_date, count from base.usage_kudu where created_date >= 20180528 group by created_date order by 1
Query submitted at: 2018-06-04 21:07:12 (Coordinator: http://ip-10-197-0-164.eu-west-1.compute.internal:25000)
Query progress can be monitored at: http://ip-10-197-0-164.eu-west-1.compute.internal:25000/query_plan?query_id=774a261fb94ad2bb:aab28b8b00000000
----------------------+
| created_date | count |
----------------------+
| 20180528 | 17607943 |
| 20180529 | 20741097 |
| 20180530 | 17362364 |
| 20180531 | 16877228 |
{{| 20180601 | 17925671 | -> 44k MORE!! }}
| 20180602 | 13325080 |
| 20180603 | 12145131 |
| 20180604 | 3788161 |
----------------------+
Fetched 8 row(s) in 0.67s
[10.197.0.164:21000] > select created_date, count from base.usage_kudu where created_date >= 20180525 group by created_date order by 1;
Query: select created_date, count from base.usage_kudu where created_date >= 20180525 group by created_date order by 1
Query submitted at: 2018-06-04 21:07:25 (Coordinator: http://ip-10-197-0-164.eu-west-1.compute.internal:25000)
Query progress can be monitored at: http://ip-10-197-0-164.eu-west-1.compute.internal:25000/query_plan?query_id=38483ad3ae5c8eb9:a538cb6300000000
----------------------+
| created_date | count |
----------------------+
| 20180525 | 22309857 |
| 20180526 | 15268520 |
| 20180527 | 14939691 |
| 20180528 | 17607943 |
| 20180529 | 20741097 |
| 20180530 | 17362364 |
| 20180531 | 16903829 |
| 20180601 | 18047010 | -> 165k MORE!!!
| 20180602 | 13325080 |
| 20180603 | 12145131 |
| 20180604 | 3788161 |
----------------------+
Fetched 11 row(s) in 0.85s
[10.197.0.164:21000] > select created_date, count from base.usage_kudu where created_date = 20180601 group by created_date;
Query: select created_date, count from base.usage_kudu where created_date = 20180601 group by created_date
Query submitted at: 2018-06-04 21:07:42 (Coordinator: http://ip-10-197-0-164.eu-west-1.compute.internal:25000)
Query progress can be monitored at: http://ip-10-197-0-164.eu-west-1.compute.internal:25000/query_plan?query_id=7343ba31f6b4c86f:621a7b8c00000000
----------------------+
| created_date | count |
----------------------+
| 20180601 | 17881253 | -> CORRECT ONE
----------------------+
Fetched 1 row(s) in 0.27s
[10.197.0.164:21000] > select created_date, count from base.usage_kudu where created_date >= 20180525 group by created_date order by 1;
Query: select created_date, count from base.usage_kudu where created_date >= 20180525 group by created_date order by 1
Query submitted at: 2018-06-04 21:12:02 (Coordinator: http://ip-10-197-0-164.eu-west-1.compute.internal:25000)
Query progress can be monitored at: http://ip-10-197-0-164.eu-west-1.compute.internal:25000/query_plan?query_id=4141df26117f35c3:9ab2f0700000000
----------------------+
| created_date | count |
----------------------+
| 20180525 | 22309857 |
| 20180526 | 15268520 |
| 20180527 | 14939691 |
| 20180528 | 17607943 |
| 20180529 | 20741097 |
| 20180530 | 17362364 |
| 20180531 | 16903829 |
| 20180601 | 18047010 | -> AGAIN WRONG RESULT!!
| 20180602 | 13325080 |
| 20180603 | 12145131 |
| 20180604 | 3788161 |
----------------------+
Fetched 11 row(s) in 1.04s}}{{Again, no other inserts/selects/updates or deletes were running between these statements on the cluster.
I checked the explain, if there is a difference,but it looks ok. But the result is different!
[10.197.0.164:21000] > explain select created_date, count from base.usage_kudu where created_date = 20180601 group by created_date;
Query: explain select created_date, count from base.usage_kudu where created_date = 20180601 group by created_date
--------------------------------------------------
| Explain String |
--------------------------------------------------
| Max Per-Host Resource Reservation: Memory=3.94MB |
| Per-Host Resource Estimates: Memory=20.00MB |
| |
| PLAN-ROOT SINK |
| | |
| 04:EXCHANGE [UNPARTITIONED] |
| | |
| 03:AGGREGATE [FINALIZE] |
| | output: count:merge |
| | group by: created_date |
| | |
| 02:EXCHANGE [HASH(created_date)] |
| | |
| 01:AGGREGATE [STREAMING] |
| | output: count |
| | group by: created_date |
| | |
| 00:SCAN KUDU [base.usage_kudu] |
| kudu predicates: created_date = 20180601 |
--------------------------------------------------
Fetched 19 row(s) in 0.06s
[10.197.0.164:21000] > select created_date, count from base.usage_kudu where created_date = 20180601 group by created_date;
Query: select created_date, count from base.usage_kudu where created_date = 20180601 group by created_date
Query submitted at: 2018-06-04 21:17:21 (Coordinator: http://ip-10-197-0-164.eu-west-1.compute.internal:25000)
Query progress can be monitored at: http://ip-10-197-0-164.eu-west-1.compute.internal:25000/query_plan?query_id=c449aabea51e7456:612f096400000000
----------------------+
| created_date | count |
----------------------+
| 20180601 | 17881253 |
----------------------+
Fetched 1 row(s) in 0.38s
[10.197.0.164:21000] > explain select created_date, count from base.usage_kudu where created_date >= 20180525 group by created_date order by 1;
Query: explain select created_date, count from base.usage_kudu where created_date >= 20180525 group by created_date order by 1
--------------------------------------------------
| Explain String |
--------------------------------------------------
| Max Per-Host Resource Reservation: Memory=9.94MB |
| Per-Host Resource Estimates: Memory=26.00MB |
| |
| PLAN-ROOT SINK |
| | |
| 05:MERGING-EXCHANGE [UNPARTITIONED] |
| | order by: created_date ASC |
| | |
| 02:SORT |
| | order by: created_date ASC |
| | |
| 04:AGGREGATE [FINALIZE] |
| | output: count:merge |
| | group by: created_date |
| | |
| 03:EXCHANGE [HASH(created_date)] |
| | |
| 01:AGGREGATE [STREAMING] |
| | output: count |
| | group by: created_date |
| | |
| 00:SCAN KUDU [base.usage_kudu] |
| kudu predicates: created_date >= 20180525 |
--------------------------------------------------
Fetched 23 row(s) in 0.05s
[10.197.0.164:21000] > select created_date, count from base.usage_kudu where created_date >= 20180525 group by created_date order by 1;
Query: select created_date, count from base.usage_kudu where created_date >= 20180525 group by created_date order by 1
Query submitted at: 2018-06-04 21:17:32 (Coordinator: http://ip-10-197-0-164.eu-west-1.compute.internal:25000)
Query progress can be monitored at: http://ip-10-197-0-164.eu-west-1.compute.internal:25000/query_plan?query_id=bc4a36f2a7ad3280:c7b09a5100000000
----------------------+
| created_date | count |
----------------------+
| 20180525 | 22309857 |
| 20180526 | 15268520 |
| 20180527 | 14939691 |
| 20180528 | 17607943 |
| 20180529 | 20741097 |
| 20180530 | 17362364 |
| 20180531 | 16903829 |
| 20180601 | 18047010 |
| 20180602 | 13325080 |
| 20180603 | 12145131 |
| 20180604 | 3788161 |
----------------------+
Fetched 11 row(s) in 0.88s
Attachments
Attachments
Issue Links
- causes
-
KUDU-2233 Check failure during compactions: pv_delete_redo != nullptr
- Resolved