Uploaded image for project: 'Phoenix'
  1. Phoenix
  2. PHOENIX-3670

KeyRange.intersect(List<KeyRange> , List<KeyRange>) is inefficient,especially for join dynamic filter

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 4.9.0
    • 4.10.0
    • None
    • None

    Description

      In my business system, there is a following join SQL(which is simplified), fact_table is a fact table, joining dimension table dim_table1 and dim_table2 :

       
      select /*+ SKIP_SCAN */ sum(t.click)  from fact_table t join dim_table1 d1 on  t.cust_id=d1.id  join dim_table2 d2 on t.cust_id =d2.id  where t.date between '2016-01-01' and '2017-01-01' and d1.code = 2008 and d2.region = 'us';
      

      I use /*+ SKIP_SCAN */ hint to enable join dynamic filter. For some small dataset, the sql executes quickly, but when the dataset is bigger, the sql becomes very slowly,when the row count of fact_table is 30 million,dim_table1 is 300 thousand and dim_table2 is 100 thousand, the above query costs 17s.

      When I debug the SQL executing, I find RHS1 return 5523 rows:

       
         select d1.id from dim_table1 d1 where d1.code = 2008
      

      and RHS2 return 23881 rows:

         select d2.id from dim_table2 d2 where d2.region='us'
      

      then HashJoinPlan uses KeyRange.intersect(List<KeyRange> , List<KeyRange> ) method to compute RHS1 intersecting RHS2 for join dynamic filter, narrowing down fact_table.cust_id should be.

      Surprisingly,the KeyRange.intersect method costs 11s ! although the whole sql execution only costs 17s.After I read the code of KeyRange.intersect method,I find following two problem:

      (1) The double loop is inefficient in line 521 and line 522,when keyRanges size is M, keyRanges2 size is N, the time complexity is O(M*N), for my example,is 5523*23881:

       
      519 public static List<KeyRange> intersect(List<KeyRange> keyRanges,  List<KeyRange> keyRanges2) {
      520        List<KeyRange> tmp = new ArrayList<KeyRange>();
      521        for (KeyRange r1 : keyRanges) {
      522            for (KeyRange r2 : keyRanges2) {
      523                KeyRange r = r1.intersect(r2);
      524                if (EMPTY_RANGE != r) {
      525                    tmp.add(r);
      526                }
      527            }
      528        }
      

      (2) line 540 shoule be r = r.union(tmp.get( i )), not intersect, just as KeyRange.coalesce method does:

       
      532        Collections.sort(tmp, KeyRange.COMPARATOR);
      533        List<KeyRange> tmp2 = new ArrayList<KeyRange>();
      534        KeyRange r = tmp.get(0);
      535        for (int i=1; i<tmp.size(); i++) {
      536            if (EMPTY_RANGE == r.intersect(tmp.get(i))) {
      537                tmp2.add(r);
      538                r = tmp.get(i);
      539            } else {
      540                r = r.intersect(tmp.get(i));
      541            }
      542        }
      

      and it seems that no unit tests for this KeyRange.intersect(List<KeyRange> , List<KeyRange>) method.

      Attachments

        1. PHOENIX-3670_v1.patch
          16 kB
          chenglei

        Activity

          People

            comnetwork chenglei
            comnetwork chenglei
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: