Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
Kudu_Impala
Description
When running queries under load, we observe the KuduScanNode can "miss" some scan ranges and thus produce incorrect results.
With tpch queries, running on the minicluster with 4 concurrent queries, q4 fails somewhat regularly with incorrect results. node with id=0 should scan a total of 9 ranges but only scanned 4. Profiles attached.
Correct execution:
KUDU_SCAN_NODE (id=0):(Total: 140.212ms, non-child: 140.212ms, % non-child: 100.00%) - BytesRead: 0 - NumScannerThreadsStarted: 5 (5) - PeakMemoryUsage: 6.68 MB (7004160) - RowsRead: 1.16M (1160553) - RowsReturned: 57.22K (57218) - RowsReturnedRate: 408.08 K/sec - ScanRangesComplete: 9 (9) - ScannerThreadsInvoluntaryContextSwitches: 992 (992) - ScannerThreadsTotalWallClockTime: 19s046ms - MaterializeTupleTime(*): 85.248ms - ScannerThreadsSysTime: 607.000us - ScannerThreadsUserTime: 1s570ms - TotalKuduReadTime: 847.033ms - ScannerThreadsVoluntaryContextSwitches: 109 (109) - TotalKuduScanRoundTrips: 72 (72) - TotalReadThroughput: 0.00 /sec
From another run while under load, this case is missing scan ranges:
KUDU_SCAN_NODE (id=0):(Total: 85.715ms, non-child: 85.715ms, % non-child: 100.00%) - BytesRead: 0 - NumScannerThreadsStarted: 1 (1) - PeakMemoryUsage: 4.16 MB (4358144) - RowsRead: 516.18K (516184) - RowsReturned: 25.63K (25630) - RowsReturnedRate: 299.01 K/sec - ScanRangesComplete: 4 (4) - ScannerThreadsInvoluntaryContextSwitches: 406 (406) - ScannerThreadsTotalWallClockTime: 1s039ms - MaterializeTupleTime(*): 248.612us - ScannerThreadsSysTime: 3.907ms - ScannerThreadsUserTime: 695.542ms - TotalKuduReadTime: 417.162ms - ScannerThreadsVoluntaryContextSwitches: 53 (53) - TotalKuduScanRoundTrips: 32 (32) - TotalReadThroughput: 0.00 /sec