Details
-
Improvement
-
Status: Closed
-
Minor
-
Resolution: Won't Fix
-
None
-
None
Description
The DataSet API features semantic annotations [1] to hint the optimizer which input fields an operator copies. This information is valuable for the optimizer because it can infer that certain physical properties such as partitioning or sorting are not destroyed by user functions and thus generate more efficient execution plans.
The Table API is built on top of the DataSet API and generates DataSet programs and code for user-defined functions. Hence, it knows exactly which fields are modified and which not. We should use this information to automatically generate forward field annotations and attach them to the operators. This can help to significantly improve the performance of certain jobs.