Details
-
New Feature
-
Status: Closed
-
Minor
-
Resolution: Won't Fix
-
1.5.0
-
None
Description
Currently, Table objects of the Table API / SQL are treated like virtual views, i.e., all relational operators that have been applied on them are recorded and translated when a Table is emitted to a TableSink or converted into a DataSet or DataStream.
In case a Table is accessed twice, the (sub-)query that it represents is translated twice into a DataSet or DataStream program and hence also executed twice which is inefficient. Currently, the only way to avoid this is to convert the Table into a DataSet or DataStream, which will cause the optimizer to generate a plan and register it back as a new Table.
We should offer a method to internally "materialize" a Table object, i.e., to optimize, generate a plan, and register the plan as an internal table. All queries / operations that are evaluated on the materialized Table will start from the same DataSet or DataStream such that it is not computed multiple times.