Uploaded image for project: 'Phoenix'
  1. Phoenix
  2. PHOENIX-4504

Subquery with ORDER BY on salted table gives wrong results

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 4.11.0
    • None
    • None
    • None
    • amazon emr phoenix 4.11.0 hbase 1.3

    Description

      Probably it is already fixed. Having a quick search I didn't find exact problem description.

      I have a table:

      create immutable table product_history_v3 (
              ts bigint not null,
              id varchar not null,
              product varchar,
              merchantid varchar,
              storeid varchar,
              constraint pk primary key (ts, id)
      ) compression=LZ4,max_filesize=150000000,memstore_flushsize=70000000,
              versions=1,update_cache_frequency=1000,append_only_schema=true,
              guid_posts_width=10000000,
      SALT_BUCKETS=20;
      create local index product_history_v3_id_ts on product_history_v3 (id, ts) compression=LZ4;
      create local index product_history_v3_merchantid_ts on product_history_v3 (merchantid, ts) include (id) compression=LZ4;
      create local index product_history_v3_storeid_ts on product_history_v3 (storeid, ts) include (id) compression=LZ4;
      

      Simple select by merchanid ordering by id,ts returns correct results:

      0: jdbc:phoenix:localhost:2181:/hbase> explain select id, ts from product_history_v3 where merchantid = '1479114284851799852-2-11-118-1577502676' and ts < 1499472000000 and ts > 1498867200000 order by id, ts limit 30;
      +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
      |                                                                                                     PLAN                                                                                                     |
      +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
      | CLIENT 20-CHUNK 0 ROWS 0 BYTES PARALLEL 20-WAY RANGE SCAN OVER PRODUCT_HISTORY_V3 [2,'1479114284851799852-2-11-118-1577502676',1498867200001] - [2,'1479114284851799852-2-11-118-1577502676',1499472000000]  |
      |     SERVER FILTER BY FIRST KEY ONLY                                                                                                                                                                          |
      |     SERVER TOP 30 ROWS SORTED BY ["ID", "TS"]                                                                                                                                                              |
      | CLIENT MERGE SORT                                                                                                                                                                                            |
      | CLIENT LIMIT 30                                                                                                                                                                                           |
      +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
      5 rows selected (0,019 seconds)
      

      It runs very fast until I add product to selected fields (cause average length of product is 10kb).

      So I'm trying to fetch id,ts in subquery, and product in outer query. It runs fast, but returns incorrect results: set of rows doesn't match to set of rows returned by query above.

      0: jdbc:phoenix:localhost:2181:/hbase> explain select id, ts, substr(product,1,30) from product_history_v3 where (id, ts) in (select id, ts from product_history_v3 where merchantid = '1479114284851799852-2-11-118-1577502676' and ts < 1499472000000 and ts > 1498867200000 order by id, ts limit 30) order by id, ts limit 30;
      +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
      |                                                                                                         PLAN                                                                                                         |
      +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
      | CLIENT 40-CHUNK 915204 ROWS 6291521905 BYTES PARALLEL 40-WAY FULL SCAN OVER PRODUCT_HISTORY_V3
      |     SERVER TOP 30 ROWS SORTED BY [PRODUCT_HISTORY_V3.ID, PRODUCT_HISTORY_V3.TS]
      | CLIENT MERGE SORT
      | CLIENT LIMIT 30
      |     SKIP-SCAN-JOIN TABLE 0
      |         CLIENT 20-CHUNK 0 ROWS 0 BYTES PARALLEL 20-WAY RANGE SCAN OVER PRODUCT_HISTORY_V3 [2,'1479114284851799852-2-11-118-1577502676',1498867200001] - [2,'1479114284851799852-2-11-118-1577502676',1499472000000]
      |             SERVER FILTER BY FIRST KEY ONLY
      |             SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY ["ID", "TS"] LIMIT 30 GROUPS
      |         CLIENT MERGE SORT
      |         CLIENT 30 ROW LIMIT
      |     DYNAMIC SERVER FILTER BY (PRODUCT_HISTORY_V3.TS, PRODUCT_HISTORY_V3.ID) IN (($470.$473, $470.$472))
      +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
      11 rows selected (0,021 seconds)
      

      However, if I change ordering a bit, so planner is forced for reordering, then set of rows is equal to original query:

      0: jdbc:phoenix:localhost:2181:/hbase> explain select id, ts, substr(product,1,30) from product_history_v3 where (id, ts) in (select id, ts from product_history_v3 where merchantid = '1479114284851799852-2-11-118-1577502676' and ts < 1499472000000 and ts > 1498867200000 order by id||'-', ts limit 30) order by id, ts limit 30;
      +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
      |                                                                                                         PLAN
      +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
      | CLIENT 40-CHUNK 915204 ROWS 6291521905 BYTES PARALLEL 40-WAY FULL SCAN OVER PRODUCT_HISTORY_V3
      |     SERVER TOP 30 ROWS SORTED BY [PRODUCT_HISTORY_V3.ID, PRODUCT_HISTORY_V3.TS]
      | CLIENT MERGE SORT
      | CLIENT LIMIT 30
      |     SKIP-SCAN-JOIN TABLE 0
      |         CLIENT 20-CHUNK 0 ROWS 0 BYTES PARALLEL 20-WAY RANGE SCAN OVER PRODUCT_HISTORY_V3 [2,'1479114284851799852-2-11-118-1577502676',1498867200001] - [2,'1479114284851799852-2-11-118-1577502676',1499472000000]
      |             SERVER FILTER BY FIRST KEY ONLY
      |             SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY ["ID", "TS"]
      |         CLIENT MERGE SORT
      |         CLIENT TOP 30 ROWS SORTED BY [("ID" || '-'), "TS"]
      |     DYNAMIC SERVER FILTER BY (PRODUCT_HISTORY_V3.TS, PRODUCT_HISTORY_V3.ID) IN (($494.$497, $494.$496))
      +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+
      11 rows selected (0,02 seconds)
      
      12 rows selected (0,021 seconds)
      

      There, certainly, should be a lot of rows to trigger this behaviour.

      Attachments

        Activity

          People

            Unassigned Unassigned
            funny_falcon Sokolov Yura
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: