At the Flink API level, we have DataStreamAPI/DataSetAPI/TableAPI&SQL, the Table API will become the first-class citizen. Table API is declarative, and can be automatically optimized, which is mentioned in the Flink mid-term roadmap by Stephan. So, first considering supporting Python at the Table level to cater to the current large number of analytics users. And Flink's goal for Python Table API as follows:
- Users can write Flink Table API job in Python, and should mirror Java / Scala Table API
- Users can submit Python Table API job in the following ways:
- Submit a job with python script, integrate with `flink run`
- Submit a job with python script by REST service
- Submit a job in an interactive way, similar `scala-shell`
- Local debug in IDE.
- Users can write custom functions(UDF, UDTF, UDAF)
- Pandas functions can be used in Flink Python Table API
A more detailed description can be found in FLIP-38.
For the API level, we make the following plan：
- The short-term:
We may initially go with a simple approach to map the Python Table API to the Java Table API via Py4J.
- The long-term:
We may need to create a Python API that follows the same structure as Flink's Table API that produces the language-independent DAG. (As Stephan already motioned on the mailing thread)