Details
-
New Feature
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
Impala 2.5.0
-
None
Description
Runtime filters from the same source are arriving ~5 seconds apart, it seems that the coordinator is either serializing the filters or it was network bound.
Query
select count(*) rowcount from store_sales a ,store_returns b where a.ss_item_sk = b.sr_item_sk and a.ss_ticket_number = b.sr_ticket_number and ss_sold_date_sk between 2450816 and 2451500 and sr_returned_date_sk between 2450816 and 2451500 group by ss_cdemo_sk,ss_store_sk,ss_item_sk , ss_ticket_number having count(*) > 1
Subplan
| 00:SCAN HDFS [tpcds_3000_parquet.store_sales a, RANDOM] partitions=683/1824 files=944 size=126.77GB runtime filters: RF000 -> a.ss_item_sk, RF001 -> a.ss_ticket_number table stats: 8639936081 rows total column stats: all hosts=61 per-host-mem=352.00MB tuple-ids=0 row-size=24B cardinality=2886246552
Filter table
ID Src. Node Tgt. Node(s) Targets Target type Partition filter Pending (Expected) First arrived Completed ------------------------------------------------------------------------------------------------------------------- 1 2 0 61 REMOTE false 0 (61) 2s881ms 10s265ms 0 2 0 61 REMOTE false 0 (61) 3s698ms 10s350ms
Filters arriving at different times
Instance 614bea9715cbde44:b0134609741aea61 (host=impala-compete-64-5.vpc.cloudera.com:22000):(Total: 30s446ms, non-child: 10s882ms, % non-child: 35.74%) Hdfs split stats (<volume id>:<# splits>/<split lengths>): 0:16/2.33 GB Filter 1 arrival: 11s854ms Filter 0 arrival: 16s047ms