Currently multi-insert queries are not optimized by Calcite. Proper integration with Calcite would include creating a spool operator whose output is reused by every insert statement; however, spool operator has not been added to Calcite yet (
In the meantime, and since complex logic for multi-insert queries is in FROM clause, we can optimize the FROM clause with Calcite and connect the optimized result to the original query.
Initially, we will recognize three different cases:
- FROM clause is trivial, e.g., table reference, or not supported. No need to optimize with Calcite.
- FROM clause is a subquery. Optimize the subquery with Calcite.
- FROM clause is a join. Rewrite join into a subquery and optimize it with Calcite. Change references in INSERT statements to refer to subquery columns.
This should be beneficial for MERGE statements processing too, since MERGE statements are treated as multi-insert queries by Hive.