Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-5226

Eagerly project unused attributes

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.2.0
    • 1.2.0
    • Table SQL / API
    • None

    Description

      The optimizer does currently not eagerly remove unused attributes.
      For example given a table tab5 with five attributes a, b, c, d, e, the following query

      SELECT x.a, y.b FROM tab5 AS x, tab5 AS y WHERE x.a = y.a
      

      would result in the non-optimized plan

      LogicalProject(a=[$0], b=[$6])
        LogicalFilter(condition=[=($0, $5)])
          LogicalJoin(condition=[true], joinType=[inner])
            LogicalTableScan(table=[[tab5]])
            LogicalTableScan(table=[[tab5]])
      

      and the optimized plan:

      DataSetCalc(select=[a, b0 AS b])
        DataSetJoin(where=[=(a, a0)], join=[a, b, c, d, e, a0, b0, c0, d0, e0], joinType=[InnerJoin])
          DataSetScan(table=[[_DataSetTable_0]])
          DataSetScan(table=[[_DataSetTable_0]])
      

      This plan is inefficient because it joins all ten attributes of both tables instead of eagerly projecting out all unused fields (x.b, x.c, x.d, x.e, y.c, y.d, y.e).

      Since this is one of the most common optimizations, I would assume that Calcite provides some rules to extract eager projections. If this is the case, the issue can be solved by adding such rules to FlinkRuleSets.

      Attachments

        Issue Links

          Activity

            People

              fhueske Fabian Hueske
              fhueske Fabian Hueske
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: