Uploaded image for project: 'Phoenix'
  1. Phoenix
  2. PHOENIX-5148

Improve OrderPreservingTracker to optimize OrderBy/GroupBy for ClientScanPlan and ClientAggregatePlan

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 4.14.1
    • 4.15.0, 5.1.0
    • None
    • None

    Description

      Given a table

        create table test ( 
             pk1 varchar not null , 
             pk2 varchar not null, 
             pk3 varchar not null,
              v1 varchar, 
              v2 varchar, 
              CONSTRAINT TEST_PK PRIMARY KEY ( 
                    pk1,
                    pk2,
                    pk3 ))
      

      Consider following three cases:

      1. OrderBy of ClientScanPlan

      for sql:

      select v1 from (select v1,v2,pk3 from test t where pk1 = '6' order by t.v2,t.pk3,t.v1 limit 10) a order by v2,pk3
      

      Obviously, the outer OrderBy "order by v2,pk3" should be compiled out because it matchs the inner query OrderBy "order by t.v2,t.pk3,t.v1" , but unfortunately it is not compiled out.

      2. GroupBy of ClientAggregatePlan

      for sql :

      select v1 from (select v1,pk2,pk1 from test t where pk1 = '6' order by t.pk2,t.v1,t.pk1 limit 10) a group by pk2,v1
      

      Obviously, the outer GroupBy "group by pk2,v1" should be orderPreserving because it matchs the inner query OrderBy "order by t.pk2,t.v1,t.pk1" , but unfortunately the isOrderPreserving() of outer GroupBy return false.

      3. OrderBy of SortMergeJoinPlan(from PHOENIX-4618)

      for sql:

      SELECT * FROM T1 JOIN T2 ON T1.a = T2.a and T1.b = T2.b

      The result of the sort-merge-join is sorted on (T1.a, T1.b) and (T2.a, T2.b) at the same time.
      Thus, both 1)

      SELECT * FROM T1 JOIN T2 ON T1.a = T2.a and T1.b = T2.b ORDER BY T1.a, T1.b

      and 2)

      SELECT * FROM T1 JOIN T2 ON T1.a = T2.a and T1.b = T2.b ORDER BY T2.a, T2.b

      should avoid doing an extra order-by after the sort-merge-join operation.

       

      All the above three cases are caused by the same problem that the OrderPreservingTracker relies solely on row keys for inferring alignment between the target Expression and the source sortedness.

      For following QueryCompiler.compileSingleQuery, because the inner query of above cases has OrderBy, so in line 534, local variable isInRowKeyOrder is false:

      518   protected QueryPlan compileSingleQuery(StatementContext context, SelectStatement select, List<Object> binds, boolean asSubquery, boolean allowPageFilter) throws SQLException{
      519           SelectStatement innerSelect = select.getInnerSelectStatement();
      520           if (innerSelect == null) {
      521                return compileSingleFlatQuery(context, select, binds, asSubquery, allowPageFilter, null, null, true);
      522           }
      523  
      524           QueryPlan innerPlan = compileSubquery(innerSelect, false);
      525           TupleProjector tupleProjector = new TupleProjector(innerPlan.getProjector());
      526           innerPlan = new TupleProjectionPlan(innerPlan, tupleProjector, null);
      527
      528           // Replace the original resolver and table with those having compiled type info.
      529           TableRef tableRef = context.getResolver().getTables().get(0);
      530            ColumnResolver resolver = FromCompiler.getResolverForCompiledDerivedTable(statement.getConnection(), tableRef, innerPlan.getProjector());
      531            context.setResolver(resolver);
      532           tableRef = resolver.getTables().get(0);
      533           context.setCurrentTable(tableRef);
      534           boolean isInRowKeyOrder = innerPlan.getGroupBy() == GroupBy.EMPTY_GROUP_BY && innerPlan.getOrderBy() == OrderBy.EMPTY_ORDER_BY;
      535
      536           return compileSingleFlatQuery(context, select, binds, asSubquery, allowPageFilter, innerPlan, tupleProjector, isInRowKeyOrder);
      537       }
      

      Because the OrderPreservingTracker relies solely on row keys, so in following line 170 of OrderByCompiler.compile , when isInRowKeyOrder is false, the OrderPreservingTracker.isOrderPreserving() checking is skipped:

      169     // If we're ordering by the order returned by the scan, we don't need an order by
      170      if (isInRowKeyOrder && tracker.isOrderPreserving()) {
      171             if (tracker.isReverse()) {
      172                  // Don't use reverse scan if:
      173                  // 1) we're using a skip scan, as our skip scan doesn't support this yet.
      174                  // 2) we have the FORWARD_SCAN hint set to choose to keep loading of column
      175                  //    families on demand versus doing a reverse scan
      176                  // REV_ROW_KEY_ORDER_BY scan would not take effect for a projected table, so don't return it for such table types.
      177                if (context.getConnection().getQueryServices().getProps().getBoolean(QueryServices.USE_REVERSE_SCAN_ATTRIB, QueryServicesOptions.DEFAULT_USE_REVERSE_SCAN)
      178                        && !context.getScanRanges().useSkipScanFilter()
      179                        && context.getCurrentTable().getTable().getType() != PTableType.PROJECTED
      180                        && context.getCurrentTable().getTable().getType() != PTableType.SUBQUERY
      181                        && !statement.getHint().hasHint(Hint.FORWARD_SCAN)) {
      182                    return OrderBy.REV_ROW_KEY_ORDER_BY;
      183               }
      184            } else {
      185                 return OrderBy.FWD_ROW_KEY_ORDER_BY;
      186           }
      187        }
      

       
      What we can do to fix this problem are :

      • refactor the OrderPreservingTracker to make it inferring alignment between the target Expression and the source sortedness based on OrderByExpression , not the row keys, so we can remove the variable isInRowKeyOrder.
      • add a new method getActualOutputOrderBys to the QueryPlan to get the output OrderBys of the query result , eg. for the above third SortMergeJoinPlan example ,getActualOutputOrderBys returns (T1.a, T1.b) and (T2.a, T2.b) .

      Besides above three cases, a implicit semantic optimization could be implemented also:
      for following sql , even the literal OrderBy of inner query is "order by t.pk1,t.pk2" . By getActualOutputOrderBys of inner QueryPlan, we can infer that the actual OrderBy of inner query is "order by t.pk1,t.pk2,t.pk3" , so the outer OrderBy "order by pk1,pk2,pk3" could be compiled out:

      select pk1 from (select pk3,pk2,pk1 from test t where v1 = '6' order by t.pk1,t.pk2 limit 10) a where pk3 > '8' order by pk1,pk2,pk3
      

      Attachments

        1. PHOENIX-5148_v3-4.x-HBase-1.2.patch
          188 kB
          chenglei
        2. PHOENIX-5148_v3-master.patch
          188 kB
          chenglei
        3. PHOENIX-5148_v3-4.x-HBase-1.4.patch
          188 kB
          chenglei
        4. PHOENIX-5148_v2-4.x-HBase-1.4.patch
          185 kB
          chenglei
        5. PHOENIX-5148-4.x-HBase-1.4.patch
          178 kB
          chenglei

        Issue Links

          Activity

            People

              comnetwork chenglei
              comnetwork chenglei
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 9h 40m
                  9h 40m