Uploaded image for project: 'Phoenix'
  1. Phoenix
  2. PHOENIX-2601

Query result is incorrect when both index hint and limit are used

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 4.4.0
    • 4.7.0
    • None
    • None
    • Linux

    Description

      Query result is incorrect when both index hint and limit are used.

      To reproduce the problem,
      1. Create an HBase table with a column family. The column family has 4 columns (a1, a2, a3, a4). Create an index on (a1, a2).

      2. Populate the table with the following data

      a1  a2     a3      a4
      --  -----  -----   -----
      1   Small  Red     USA
      1   Small  Yellow  UK
      1   Small  Green   China
      

      3. Run the following Phoenix queries, and the query results are correct.

      select a1, a2, a3, a4 from table where a1 = '1' and a2 = 'Small' ;
      

      Three rows are returned as expected

      select a1, a2, a3, a4 from table where a1 = '1' and a2 = 'Small' and a3 = "Yellow" limit 1;
      

      One row is returned as expected

      select /*+ INDEX(add_index_hint_here) */ a1, a2, a3, a4 from table where a1 = '1' and a2 = 'Small' and a3 = "Yellow";
      

      One row is returned as expected

      4. However, with the combination of index hint and limit clause, the query result is incorrect.

      select /*+ INDEX(add_index_hint_here) */ a1, a2, a3, a4 from table where a1 = '1' and a2 = 'Small' and a3 = "Yellow" limit 1;
      

      Zero row is returned. The expected result is 1 row (i.e. the second row in the example).

      The explain plan of the problematic query

      +------------------------------------------+
      |                   PLAN                   |
      +------------------------------------------+
      | CLIENT 11-CHUNK PARALLEL 1-WAY ... MY_TABLE |
      |     SERVER FILTER BY A.A3 = 'Yellow' |
      | CLIENT 1 ROW LIMIT                       |
      |     SKIP-SCAN-JOIN TABLE 0               |
      |         CLIENT 1-CHUNK SERIAL 1-WAY RANGE SCAN OVER MY_INDEX [[52,48,48,45,69,79,84,45,48,48,57],23] |
      |             SERVER FILTER BY FIRST KEY ONLY |
      |             SERVER 1 ROW LIMIT           |
      |         CLIENT 1 ROW LIMIT               |
      |     DYNAMIC SERVER FILTER BY ("MY_ROW_KEY") IN (($148.$150, $1 |
      |     JOIN-SCANNER 1 ROW LIMIT             |
      +------------------------------------------+
      

      It looks like the query plan uses the index first. Although all three rows match the index (a1 = '1' and 'a2 = 'Small'), but the intermediate result only has the first row due the "limit 1" clause. In this case, the intermediate result has one row (a3 = 'Red'). Then the query engine tries to apply the (a3 = 'Yellow') to the intermediate result, and get zero row as the final result.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            maryannxue Wei Xue
            simonlee Simon Lee
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment