Uploaded image for project: 'Apache Storm'
  1. Apache Storm
  2. STORM-1446

Compile the Calcite logical plan to Storm Trident logical plan

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.0.0, 1.1.0
    • Component/s: storm-sql
    • Labels:
      None

      Description

      As suggested in https://issues.apache.org/jira/browse/STORM-1040?focusedCommentId=15036651&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15036651, compiling the logical plan from Calcite down to Storm physical plan will clarify the implementation of StormSQL.

      > Motive behind this big change and benefits
      This is started from Julian's comment and also Milinda's comment.

      For me having own relational algebras (rel) has several advantages,

      • We can push operator handling logic to rel itself. Before that we should traverse Calcite logical rel tree with PostOrderRelNodeVisitor, and visitor needs to handle Calcite's rel directly. Now the logic how to configure Trident topology is all handled from separate rels.
      • We sometimes want to have more derived rels compared to Calcite logical operators. One of example is Join. There's only one logical rel regarding join in Calcite - LogicalJoin - but we're now converting LogicalJoin to EquiJoin if conditions are met. If we have various types of join it will make the difference. We're not prepared yet, but streaming scan vs table scan, and streaming insert vs table insert are the other cases.
      TridentStormAggregateRel(group=[{0}], EXPR$1=[COUNT()])
        TridentStormCalcRel(expr#0..4=[{inputs}], expr#5=[0], expr#6=[>($t0, $t5)], DEPTID=[$t3], EMPID=[$t0], $condition=[$t6])
          TridentStormEquiJoinRel(condition=[=($2, $3)], joinType=[inner])
            TridentStormStreamScanRel(table=[[EMP]])
            TridentStormStreamScanRel(table=[[DEPT]])
      
      • We can even override the methods how to represent the rel in explain string if we think Calcite's explain is less informational. For example, showing initial parallelism (when we support) for Scan.
      • We can apply query optimizations: Defining derived rels helps further query optimizations, like filter pushdown. Calcite rels is not aware of data source characteristic, and we can include it to our own rels.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                kabhwan Jungtaek Lim
                Reporter:
                wheat9 Haohui Mai
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 3h
                  3h