Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-12308

Support python language in Flink Table API

    XMLWordPrintableJSON

    Details

      Description

      At the Flink API level, we have DataStreamAPI/DataSetAPI/TableAPI&SQL, the Table API will become the first-class citizen. Table API is declarative, and can be automatically optimized, which is mentioned in the Flink mid-term roadmap by Stephan. So, first considering supporting Python at the Table level to cater to the current large number of analytics users. And Flink's goal for Python Table API as follows:

      • Users can write Flink Table API job in Python, and should mirror Java / Scala Table API
      • Users can submit Python Table API job in the following ways:
        • Submit a job with python script, integrate with `flink run`
        • Submit a job with python script by REST service
        • Submit a job in an interactive way, similar `scala-shell`
        • Local debug in IDE.
      • Users can write custom functions(UDF, UDTF, UDAF)
      • Pandas functions can be used in Flink Python Table API

      A more detailed description can be found in FLIP-38.

      For the API level, we make the following plan:

      • The short-term:
        We may initially go with a simple approach to map the Python Table API to the Java Table API via Py4J.
      • The long-term:
        We may need to create a Python API that follows the same structure as Flink's Table API that produces the language-independent DAG. (As Stephan already motioned on the mailing thread)

        Attachments

        1.
        Add base python framework and Add Scan, Projection, and Filter operator support Sub-task Closed Wei Zhong
        2.
        Add a basic test framework, just like the existing Java TableAPI, abstract some TestBase. Sub-task Closed Dian Fu
        3.
        Add simplicity support for submitting Python Table API job in CliFrontend, i.e. `flink run -py wordcount.py` can be work(with simple test). Sub-task Closed Huang Xingbo
        4.
        Add integrated Tox for ensuring compatibility with the python2/3 version Sub-task Closed sunjincheng
        5.
        Integrated Travis for Python Table API Sub-task Closed Wei Zhong
        6.
        Add all table operators align Java Table API Sub-task Closed Wei Zhong
        7.
        Support to define all kinds of types in Python API Sub-task Closed Dian Fu
        8.
        Adds from_elements in TableEnvironment Sub-task Closed Dian Fu
        9.
        Add FileSystem Connector with CSV format support in Python Table API Sub-task Closed Wei Zhong
        10.
        Add all connector support align Java Table API Sub-task Closed Wei Zhong
        11.
        Add custom check options for lint-python.sh Sub-task Closed sunjincheng
        12.
        Add a tool to check the user interface of Python Table API aligns with Java Table API. Sub-task Closed sunjincheng
        13.
        Move PythonGatewayServer into flink-clients Sub-task Closed sunjincheng
        14.
        Add deploy a Python Flink job and session cluster on Kubernetes support. Sub-task Reopened Dian Fu
        15.
        Reduce the test cost for Python API Sub-task Closed Wei Zhong
        16.
        Add all format support align with the Java Table API Sub-task Closed Wei Zhong
        17.
        Align Stream/BatchTableEnvironment with JAVA Table API Sub-task Closed Wei Zhong
        18.
        Add TableSchema for Python Table API Sub-task Closed Wei Zhong
        19.
        Align the Python data types with Java Sub-task Closed Dian Fu
        20.
        Enable the configuration of using blink planner Sub-task Closed Wei Zhong
        21.
        Add an interactive shell for Python Table API Sub-task Closed Huang Xingbo
        22.
        Add windows support for the Python shell script Sub-task Open Wei Zhong  
        23.
        Add the Python catalog API Sub-task Closed Dian Fu
        24.
        Add the Python Table API Sphinx docs Sub-task Closed Huang Xingbo
        25.
        Adds Python Table API tutorial Sub-task Closed Dian Fu
        26.
        Adds a wiki page about setting up a Python Table API development environment Sub-task Closed sunjincheng
        27.
        Improves the Python word_count example to use the descriptor API Sub-task Closed Dian Fu
        28.
        Support user defined connectors/format Sub-task Closed Dian Fu
        29.
        Allow to specify directory in option -pyfs Sub-task Closed Dian Fu
        30.
        Add support to run a Python job-specific cluster on Kubernetes Sub-task Closed Dian Fu
        31.
        Correct the package name for python API Sub-task Closed Dian Fu
        32.
        Add support for build Python Docs in Buildbot Sub-task Closed sunjincheng  
        33.
        Improves the performance of Python Table API test cases Sub-task Closed Dian Fu
        34.
        Improve the Python Table API docs by adding more examples Sub-task Closed Wei Zhong
        35.
        Allows pyflink to be pip installed Sub-task Closed Wei Zhong
        36.
        Release the PyFlink into PyPI Sub-task Closed sunjincheng
        37.
        Supported java UDFs in python API Sub-task Closed Dian Fu

          Activity

            People

            • Assignee:
              sunjincheng121 sunjincheng
              Reporter:
              sunjincheng121 sunjincheng
            • Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

              • Created:
                Updated:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 12h
                12h