Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
4.4.0
-
None
-
None
-
Linux
Description
Query result is incorrect when both index hint and limit are used.
To reproduce the problem,
1. Create an HBase table with a column family. The column family has 4 columns (a1, a2, a3, a4). Create an index on (a1, a2).
2. Populate the table with the following data
a1 a2 a3 a4 -- ----- ----- ----- 1 Small Red USA 1 Small Yellow UK 1 Small Green China
3. Run the following Phoenix queries, and the query results are correct.
select a1, a2, a3, a4 from table where a1 = '1' and a2 = 'Small' ;
Three rows are returned as expected
select a1, a2, a3, a4 from table where a1 = '1' and a2 = 'Small' and a3 = "Yellow" limit 1;
One row is returned as expected
select /*+ INDEX(add_index_hint_here) */ a1, a2, a3, a4 from table where a1 = '1' and a2 = 'Small' and a3 = "Yellow";
One row is returned as expected
4. However, with the combination of index hint and limit clause, the query result is incorrect.
select /*+ INDEX(add_index_hint_here) */ a1, a2, a3, a4 from table where a1 = '1' and a2 = 'Small' and a3 = "Yellow" limit 1;
Zero row is returned. The expected result is 1 row (i.e. the second row in the example).
The explain plan of the problematic query
+------------------------------------------+ | PLAN | +------------------------------------------+ | CLIENT 11-CHUNK PARALLEL 1-WAY ... MY_TABLE | | SERVER FILTER BY A.A3 = 'Yellow' | | CLIENT 1 ROW LIMIT | | SKIP-SCAN-JOIN TABLE 0 | | CLIENT 1-CHUNK SERIAL 1-WAY RANGE SCAN OVER MY_INDEX [[52,48,48,45,69,79,84,45,48,48,57],23] | | SERVER FILTER BY FIRST KEY ONLY | | SERVER 1 ROW LIMIT | | CLIENT 1 ROW LIMIT | | DYNAMIC SERVER FILTER BY ("MY_ROW_KEY") IN (($148.$150, $1 | | JOIN-SCANNER 1 ROW LIMIT | +------------------------------------------+
It looks like the query plan uses the index first. Although all three rows match the index (a1 = '1' and 'a2 = 'Small'), but the intermediate result only has the first row due the "limit 1" clause. In this case, the intermediate result has one row (a3 = 'Red'). Then the query engine tries to apply the (a3 = 'Yellow') to the intermediate result, and get zero row as the final result.