Affects Version/s: None
Fix Version/s: None
Release Note:Not for the current sprint. Schedule for after June 2013.
OR filters have been optimized to run with vectorized query execution. IN filters of the form "Column IN (list-of-constants)" are a special case of OR. However, IN does not vectorized currently.
select ddate, count(*) from factsqlengineam_vec_orc where ddate = "2012-05-19 00:00:00" OR ddate = "2012-05-20 00:00:00" or ddate = "2012-05-21 00:00:00" group by ddate;
takes about 23 seconds of CPU and
select ddate, count(*) from factsqlengineam_vec_orc where ddate IN ("2012-05-19 00:00:00", "2012-05-20 00:00:00", "2012-05-21 00:00:00") group by ddate;
takes about 153 seconds of CPU.
A simple fix may be that for short IN lists (say <= 64 elements) we turn them into OR by manipulating the query tree before planning whether vectorization can be used.
A more complex fix that covers more cases would be to turn longer IN lists into a join so when we eventually support vectorized joins it will be fast.
An intermediate approach might be to implement a special IN filter operator that stores the constant values in a sorted array or high-performance hash table (like Cuckoo hashing).