Details
-
New Feature
-
Status: Closed
-
Major
-
Resolution: Implemented
-
None
-
None
Description
In order to support input formats for the Table API, interfaces are necessary. I propose two types of TableSources:
- AdaptiveTableSources can adapt their output to the requirements of the plan. Although the output schema stays the same, the TableSource can react on field resolution and/or predicates internally and can return adapted DataSet/DataStream versions in the "translate" step.
- StaticTableSources are an easy way to provide the Table API with additional input formats without much implementation effort (e.g. for fromCsvFile())
TableSources need to be deeply integrated into the Table API.
The TableEnvironment requires a newly introduced AbstractExecutionEnvironment (common super class of all ExecutionEnvironments for DataSets and DataStreams).
Here's what a TableSource can see from more complicated queries:
getTableJava(tableSource1) .filter("a===5 || a===6") .select("a as a4, b as b4, c as c4") .filter("b4===7") .join(getTableJava(tableSource2)) .where("a===a4 && c==='Test' && c4==='Test2'") // Result predicates for tableSource1: // List("a===5 || a===6", "b===7", "c==='Test2'") // Result predicates for tableSource2: // List("c==='Test'") // Result resolved fields for tableSource1 (true = filtering, false=selection): // Set(("a", true), ("a", false), ("b", true), ("b", false), ("c", false), ("c", true)) // Result resolved fields for tableSource2 (true = filtering, false=selection): // Set(("a", true), ("c", true))
Attachments
Issue Links
- is required by
-
FLINK-2166 Add fromCsvFile() to TableEnvironment
- Closed
-
FLINK-2167 Register HCatalog as external catalog in TableEnvironment
- Closed
-
FLINK-2168 Add HBaseTableSource
- Closed
-
FLINK-2169 Add ParquetTableSource
- Closed
-
FLINK-2170 Add OrcTableSource
- Closed