Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
Impala 0.6
-
None
-
None
Description
I'm running the following two queries. The only difference between them is I'm using "LIKE" in one case and "=" in another, though there is no "%" in the LIKE, so the effect is the same. I was surprised to see approximately a 10x difference in performance between them.
Query: select v1, c, count(*) FROM xxx b, yyy a WHERE a.v1 = b.file AND v5 LIKE "hostId" AND v3 = "hosts" GROUP BY v1, c ORDER BY count(*) limit 1000 Returned 89 row(s) in 10.13s Query: select v1, c, count(*) FROM xxx b, yyy a WHERE a.v1 = b.file AND v5 LIKE "hostId" AND v3 = "hosts" GROUP BY v1, c ORDER BY count(*) limit 1000 Returned 89 row(s) in 93.76s
I'm running
impalad version 0.6 RELEASE (build e675301a90e370f694d700b395a13f0265b7f09c)
I've attached the two query profiles. The basic difference is in the execution rate:
- Averaged Fragment 2:(1m27s 0.00%) - completion times: min:1m19s max:1m32s mean: 1m28s stddev:4s545ms - execution rates: min:35.33 MB/sec max:41.00 MB/sec mean:37.37 MB/sec stddev:1.90 MB/sec + - RowsReturnedRate: 9.00 /sec + Averaged Fragment 2:(7s906ms 0.00%) + completion times: min:7s620ms max:9s495ms mean: 8s056ms stddev:653ms + execution rates: min:342.95 MB/sec max:436.42 MB/sec mean:409.84 MB/sec stddev:31.25 MB/sec
Obviously I've fixed my query.