Details
-
New Feature
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
Druid 0.11 has newly built in capabilities called Expressions that can be used to push expression like projects/aggregates/filters.
In order to leverage this new feature, some changes need to be done to the Druid Calcite adapter.
This is a link to the current supported functions and expressions in Druid
http://druid.io/docs/latest/misc/math-expr.html
As you can see from the Docs an expression can be an actual tree of operators,
Expression can be used with Filters, Projects, Aggregates, PostAggregates and
Having filters. For Filters will have new Filter kind called Filter expression.
FYI, you might ask can we push everything as Expression Filter the short answer
is no because, other kinds of Druid filters perform better when used, Hence
Expression filter is a plan B sort of thing. In order to push expression as
Projects and Aggregates we will be using Expression based Virtual Columns.
The major change is the merging of the logic of pushdown verification code and
the Translation of RexCall/RexNode to Druid Json, native physical language. The
main drive behind this redesign is the fact that in order to check if we can
push down a tree of expressions to Druid we have to compute the Druid Expression
String anyway. Thus instead of having 2 different code paths, one for pushdown
validation and one for Json generation we can have one function that does both.
For instance instead of having one code path to test and check if a given filter
can be pushed or not and then having a translation layer code, will have
one function that either returns a valid Druid Filter or null if it is not
possible to pushdown. The same idea will be applied to how we push Projects and
Aggregates, Post Aggregates and Sort.
Here are the main elements/Classes of the new design. First will be merging the logic of
Translation of Literals/InputRex/RexCall to a Druid physical representation.
Translate leaf RexNode to Valid pair Druid Column + Extraction functions if possible
/** * @param rexNode leaf Input Ref to Druid Column * @param rowType row type * @param druidQuery druid query * * @return {@link Pair} of Column name and Extraction Function on the top of the input ref or * {@link Pair of(null, null)} when can not translate to valid Druid column */ protected static Pair<String, ExtractionFunction> toDruidColumn(RexNode rexNode, RelDataType rowType, DruidQuery druidQuery )
In the other hand, in order to Convert Literals to Druid Literals will introduce
/** * @param rexNode rexNode to translate to Druid literal equivalante * @param rowType rowType associated to rexNode * @param druidQuery druid Query * * @return non null string or null if it can not translate to valid Druid equivalent */ @Nullable private static String toDruidLiteral(RexNode rexNode, RelDataType rowType, DruidQuery druidQuery )
Main new functions used to pushdown nodes and Druid Json generation.
Filter pushdown verification and generates is done via
org.apache.calcite.adapter.druid.DruidJsonFilter#toDruidFilters
For project pushdown added
org.apache.calcite.adapter.druid.DruidQuery#computeProjectAsScan.
For Grouping pushdown added
org.apache.calcite.adapter.druid.DruidQuery#computeProjectGroupSet.
For Aggregation pushdown added
org.apache.calcite.adapter.druid.DruidQuery#computeDruidJsonAgg
For sort pushdown added
org.apache.calcite.adapter.druid.DruidQuery#computeSort\{code} Pushing of PostAggregates will be using Expression post Aggregates and use
org.apache.calcite.adapter.druid.DruidExpressions#toDruidExpression{code}
to generate expression
For Expression computation most of the work is done here
org.apache.calcite.adapter.druid.DruidExpressions#toDruidExpression\{code} This static function generates Druid String expression out of a given RexNode or returns null if not possible.
@Nullable
public static String toDruidExpression(
final RexNode rexNode,
final RelDataType inputRowType,
final DruidQuery druidRel
)
In order to support various kind of expressions added the following interface
org.apache.calcite.adapter.druid.DruidSqlOperatorConverter{code}
Thus user can implement custom expression converter based on the SqlOperator syntax and signature.
public interface DruidSqlOperatorConverter { /** * Returns the calcite SQL operator corresponding to Druid operator. * * @return operator */ SqlOperator calciteOperator(); /** * Translate rexNode to valid Druid expression. * @param rexNode rexNode to translate to Druid expression * @param rowType row type associated with rexNode * @param druidQuery druid query used to figure out configs/fields related like timeZone * * @return valid Druid expression or null if it can not convert the rexNode */ @Nullable String toDruidExpression(RexNode rexNode, RelDataType rowType, DruidQuery druidQuery); }
The Druid Query Class will provide
org.apache.calcite.adapter.druid.DruidQuery#getOperatorConversionMap which is a
map of SqlOperator to DruidSqlOperatorConverter.
Any feedback is welcome.
Attachments
Issue Links
- contains
-
HIVE-18226 handle UDF to double/int over aggregate
- Resolved
-
HIVE-14518 Support 'having' translation for Druid GroupBy queries
- Closed
-
CALCITE-2119 Druid Filter validation Logic broken for filters like column_A = column_B
- Closed
-
HIVE-17716 Not pushing postaggregations into Druid due to CAST on constant
- Open