Uploaded image for project: 'Phoenix'
  1. Phoenix
  2. PHOENIX-5389

Push down PostFilter to Sub-JoinTable for SortMergeJoin and NoStarJoin

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 4.14.2
    • Fix Version/s: 4.15.0, 5.1.0
    • Labels:
      None

      Description

      Given following tables:

         create table order_table
         (order_id varchar(15) not null primary key, 
          customer_id varchar(10), 
          item_id varchar(10), 
          price integer, 
          quantity integer, 
          date timestamp)
      
         create table item_table
         (item_id varchar(10) not null primary key, 
           name varchar, 
           price integer, 
           discount1 integer, 
           discount2 integer, 
           supplier_id varchar(10), 
           description varchar)
      
         create table supplier_table
          (supplier_id varchar(10) not null primary key, 
            name varchar, 
            phone varchar(12), 
            address varchar, 
            loc_id varchar(5))
      

      for following sql :

      select /*+ USE_SORT_MERGE_JOIN */  o.order_id,i.item_id, s.supplier_id, s.name from 
      supplier_table s inner join item_table i on  s.supplier_id = i.supplier_id 
      inner join order_table o on  i.item_id = o.item_id 
      where  s.address = 'hai' or i.supplier_id != 'medi'
      

      When using SortMergeJoin, the above join is composited as:
      (supplier_table inner join item_table) inner join order_table, which doing the supplier_table s inner join item_table first, and then join order_table.
      Obviously, the where condition s.address = 'hai' or i.supplier_id != 'medi' could be pushed down to the first join (supplier_table inner join item_table),because it only references the columns of supplier_table and item_table.
      but unfortunately, this where condition s.address = 'hai' or i.supplier_id != 'medi' does not be pushed down as expected, and it becomes the postFilter of the whole join (supplier_table s inner join item_table i) inner join order_table. Of course, it is very ineffiecient.

      This problem is caused by the following code line 431 in QueryCompiler.compileJoinQuery:

      430    case SORT_MERGE: {
      431                JoinTable lhsJoin = joinTable.getSubJoinTableWithoutPostFilters();
      432                JoinSpec lastJoinSpec = joinSpecs.get(joinSpecs.size() - 1);
      433                JoinType type = lastJoinSpec.getType();
      434                JoinTable rhsJoin = lastJoinSpec.getJoinTable();
      

      At the beginning of SORT_MERGE, above line 431 create a new JoinTable for (supplier_table inner join item_table) using following JoinTable.getSubJoinTableWithoutPostFilters() method:

      441  public JoinTable getSubJoinTableWithoutPostFilters() {
      442            return joinSpecs.size() > 1 ? new JoinTable(table, joinSpecs.subList(0, joinSpecs.size() - 1)) :
      443                new JoinTable(table);
      444        }
      

      JoinTable.getSubJoinTableWithoutPostFilters() only creates the new JoinTable and does not try to push down the where condition s.address = 'hai' or i.supplier_id != 'medi' to the new created JoinTable, so cause this problem.

        Attachments

        1. PHOENIX-5389_v3-4.x-HBase-1.4.patch
          70 kB
          chenglei
        2. PHOENIX-5389_v2-4.x-HBase-1.4.patch
          69 kB
          chenglei
        3. PHOENIX-5389_v1-4.x-HBase-1.4.patch
          65 kB
          chenglei

          Activity

            People

            • Assignee:
              comnetwork chenglei
              Reporter:
              comnetwork chenglei
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 2h 50m
                2h 50m