Uploaded image for project: 'Phoenix'
  1. Phoenix
  2. PHOENIX-1598 Encode column names to save space and improve performance
  3. PHOENIX-3667

Optimize BooleanExpressionFilter and ColumnProjectionFilter for tables with encoded columns

    Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      The client side of Phoenix determines the subclass of BooleanExpressionFilter we use based on how many column families and column qualifiers are being referenced. The idea is to minimize the lookup cost during filter evaluation. For encoded columns, instead of using a Map or Set, we can create a few new subclasses of BooleanExpressionFilter that use an array instead. No need for any lookups or equality checks - just fill in the position based on the column qualifier value instead. Since filters are applied on every row between the start/stop key, this will improve performance quite a bit.

      1. PHOENIX-3667_wip.patch
        30 kB
        Samarth Jain
      2. PHOENIX-3667.patch
        44 kB
        Samarth Jain
      3. WhereClause.jpg
        87 kB
        Samarth Jain
      4. PHOENIX-3667_v2.patch
        44 kB
        Samarth Jain

        Activity

        Hide
        samarthjain Samarth Jain added a comment -

        This sounds like a good optimization to do. There are a few more tweaks we can do here:

        • for encoded mutable tables, because column qualifiers are unique across all column families, we don’t need to compare column families.
        • for encoded mutable tables, stop after we have encountered a column qualifier that is >= max qualifier we expect.
        • We should count the number of unique column qualifiers in our where clause expression using the KeyValueColumnExpressionVisitor. This will be used for us to determine the size of the array we will be using for storing the foundColumns. For non-encoded tables this count can be used to properly size the map of foundColumns.
        Show
        samarthjain Samarth Jain added a comment - This sounds like a good optimization to do. There are a few more tweaks we can do here: for encoded mutable tables, because column qualifiers are unique across all column families, we don’t need to compare column families. for encoded mutable tables, stop after we have encountered a column qualifier that is >= max qualifier we expect. We should count the number of unique column qualifiers in our where clause expression using the KeyValueColumnExpressionVisitor. This will be used for us to determine the size of the array we will be using for storing the foundColumns. For non-encoded tables this count can be used to properly size the map of foundColumns.
        Hide
        samarthjain Samarth Jain added a comment -

        For encoded mutable tables, stop after we have encountered a column qualifier that is >= max qualifier we expect.

        James Taylor, do filters need to return ReturnCode.NEXT_COL to make sure that we don't move on to the next row since we may need to return columns that in the select part of the sql too? If yes, then this would be more work as then I would need to include the columns in the select part of the sql while determining the max qualifier.

        Or does adding a column into the scan via scan.addCol() is enough and orthogonal to filters.

        Show
        samarthjain Samarth Jain added a comment - For encoded mutable tables, stop after we have encountered a column qualifier that is >= max qualifier we expect. James Taylor , do filters need to return ReturnCode.NEXT_COL to make sure that we don't move on to the next row since we may need to return columns that in the select part of the sql too? If yes, then this would be more work as then I would need to include the columns in the select part of the sql while determining the max qualifier. Or does adding a column into the scan via scan.addCol() is enough and orthogonal to filters.
        Hide
        jamestaylor James Taylor added a comment -

        Good point. For columns that are projected, but not referenced in the where clause, I think we'd need to return NEXT_COL.

        Show
        jamestaylor James Taylor added a comment - Good point. For columns that are projected, but not referenced in the where clause, I think we'd need to return NEXT_COL.
        Hide
        samarthjain Samarth Jain added a comment -

        Parking a WIP patch. Existing tests pass. Will run perf tests now to evaluate the benefit.

        Show
        samarthjain Samarth Jain added a comment - Parking a WIP patch. Existing tests pass. Will run perf tests now to evaluate the benefit.
        Hide
        jamestaylor James Taylor added a comment -

        A few minor nits on patch for EncodedCQIncrementalResultTuple:

        • Can we name markTupleAsImmutable method as setImmutable to match the other one?
        • The refCount member variable goes from 0 to expectedCardinality directly. How about adding a setCell method to EncodedCQIncrementalResultTuple and calling that instead of making the filteredKeyValues.setCell call this directly? Then you could increment refCount when that's called.
          +        if (isQualifierForColumnInWhereExpression(qualifier)) {
          +            filteredKeyValues.setCell(qualifier, cell);
          
        • Can the getValue(int index) method call filteredKeyValues.get(index)? I know it won't be efficient, but it's not currently called anyway.
        Show
        jamestaylor James Taylor added a comment - A few minor nits on patch for EncodedCQIncrementalResultTuple: Can we name markTupleAsImmutable method as setImmutable to match the other one? The refCount member variable goes from 0 to expectedCardinality directly. How about adding a setCell method to EncodedCQIncrementalResultTuple and calling that instead of making the filteredKeyValues.setCell call this directly? Then you could increment refCount when that's called. + if (isQualifierForColumnInWhereExpression(qualifier)) { + filteredKeyValues.setCell(qualifier, cell); Can the getValue(int index) method call filteredKeyValues.get(index)? I know it won't be efficient, but it's not currently called anyway.
        Hide
        samarthjain Samarth Jain added a comment -

        Updated patch with review comments. I also added a new filter - EncodedQualifiersProjectionFilter which removes the need to using a TreeMap for tracking the column qualifiers. I have instead used a BitSet where the index into the bit set is the decoded column qualifier. Will list perf test results in the next comment.

        Show
        samarthjain Samarth Jain added a comment - Updated patch with review comments. I also added a new filter - EncodedQualifiersProjectionFilter which removes the need to using a TreeMap for tracking the column qualifiers. I have instead used a BitSet where the index into the bit set is the decoded column qualifier. Will list perf test results in the next comment.
        Hide
        samarthjain Samarth Jain added a comment -

        My first perf test was to track query performance as we increase the number of columns we filter by in the where clause. As the number of columns increased, the query performance kept getting worse with the older filter. Around 20 columns or so, the perf difference was close to 100%.

        Show
        samarthjain Samarth Jain added a comment - My first perf test was to track query performance as we increase the number of columns we filter by in the where clause. As the number of columns increased, the query performance kept getting worse with the older filter. Around 20 columns or so, the perf difference was close to 100%.
        Hide
        jamestaylor James Taylor added a comment -

        This looks good, except for the EncodedCQIncrementalResultTuple.getValue(int index) call. When used, this call allows the Cells to be iterated over like this:

            for (int i = 0; i < tuple.size(); i++) {
                System.out.println( tuple.getValue(i) );
            }
        

        Here's how I'd recommend implementing getCellForIndex (with a javadoc comment that let's folks know it won't perform well):

            public void Cell getCellAtIndex(int index) {
                int bitIndex;
                for (bitIndex = filteredQualifiers.nextSetBit(0); bitIndex >= 0 && index >= 0; bitIndex = filteredQualifiers.nextSetBit(bitIndex+1)) {
                    index--;
                }
                if (bitIndex < 0) {
                    throw new NoSuchElementException();
                }
                return filteredCells[bitIndex];
            }
        
        Show
        jamestaylor James Taylor added a comment - This looks good, except for the EncodedCQIncrementalResultTuple.getValue(int index) call. When used, this call allows the Cells to be iterated over like this: for ( int i = 0; i < tuple.size(); i++) { System .out.println( tuple.getValue(i) ); } Here's how I'd recommend implementing getCellForIndex (with a javadoc comment that let's folks know it won't perform well): public void Cell getCellAtIndex( int index) { int bitIndex; for (bitIndex = filteredQualifiers.nextSetBit(0); bitIndex >= 0 && index >= 0; bitIndex = filteredQualifiers.nextSetBit(bitIndex+1)) { index--; } if (bitIndex < 0) { throw new NoSuchElementException(); } return filteredCells[bitIndex]; }
        Hide
        samarthjain Samarth Jain added a comment -

        Thanks for the reviews, James Taylor. Updated patch with the getCellAtIndex() method as you suggested.

        Show
        samarthjain Samarth Jain added a comment - Thanks for the reviews, James Taylor . Updated patch with the getCellAtIndex() method as you suggested.
        Hide
        jamestaylor James Taylor added a comment -

        +1. Looks great, Samarth Jain.

        Show
        jamestaylor James Taylor added a comment - +1. Looks great, Samarth Jain .

          People

          • Assignee:
            samarthjain Samarth Jain
            Reporter:
            jamestaylor James Taylor
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development