Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-18761

Support Python DataStream API (Stateless part)

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 1.12.0
    • None

    Description

      This is the umbrella Jira for FLIP-130, which intends to support Python DataStream API for the stateless part.

      FLIP wiki page: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158866298

      As we all know, Flink provides three layered APIs: the ProcessFunctions, the DataStream API and the SQL & Table API. Each API offers a different trade-off between conciseness and expressiveness and targets different use cases.

      Currently, the SQL & Table API has already been supported in PyFlink. The API provides relational operations as well as user-defined functions to provide convenience for users who are familiar with python and relational programming.

      Meanwhile, the DataStream API and ProcessFunctions provide more generic APIs to implement stream processing applications. The ProcessFunctions expose time and state which are the fundamental building blocks for any kind of streaming application. To cover more use cases, we are planning to cover all these APIs in PyFlink.

      In this FLIP, we propose to support the Python DataStream API for the stateless part. For more detail, please refer to the FLIP wiki page. As for the stateful part, it would come later after this FLIP. 

      Attachments

        Issue Links

          1.
          Support basic TypeInformation for Python DataStream API Sub-task Closed Shuiqiang Chen
          2.
          Support from_collection for Python DataStream API Sub-task Closed Shuiqiang Chen
          3.
          Support map() and flat_map() for Python DataStream API Sub-task Closed Shuiqiang Chen
          4.
          Support add_sink() for Python DataStream API Sub-task Closed Shuiqiang Chen
          5.
          Support add_source() to get a DataStream for Python StreamExecutionEnvironment Sub-task Closed Shuiqiang Chen
          6.
          Support read_text_file() and print() interface for Python DataStream API. Sub-task Closed Shuiqiang Chen
          7.
          Support key_by() operation for Python DataStream API Sub-task Closed Shuiqiang Chen
          8.
          Support filter() operation for Python DataStream API. Sub-task Closed Shuiqiang Chen
          9.
          Support conversion between Table and DataStream Sub-task Closed Shuiqiang Chen
          10.
          Support dependency management for Python StreamExecutionEnvironment. Sub-task Closed Shuiqiang Chen
          11.
          Support reduce() operation for Python KeyedStream. Sub-task Closed Hequn Cheng
          12.
          Add chaining strategy and slot sharing group interfaces for Python DataStream API Sub-task Closed Shuiqiang Chen
          13.
          Add partitioning interfaces for Python DataStream API. Sub-task Closed Shuiqiang Chen
          14.
          Support execute_async for StreamExecutionEnvironment. Sub-task Closed Shuiqiang Chen
          15.
          Support Row Serialization and Deserialization schemas for DataStream source/sink Sub-task Closed Shuiqiang Chen
          16.
          Support Kafka connectors for Python DataStream API Sub-task Closed Shuiqiang Chen
          17.
          Support CoMapFunction for Python DataStream API Sub-task Closed Hequn Cheng
          18.
          Support CoFlatMap for Python DataStream API Sub-task Closed Hequn Cheng
          19.
          Support key_by() on ConnectedStreams for Python DataStream API Sub-task Closed Hequn Cheng
          20.
          Support partitionCustom() operation for Python DataStream API Sub-task Closed shuiqiangchen
          21.
          Support Cassandra connector for Python DataStream API Sub-task Closed Shuiqiang Chen
          22.
          Support JDBC connector for Python DataStream API Sub-task Closed Shuiqiang Chen
          23.
          Support Streaming File Sink for Python DataStream API Sub-task Closed Hequn Cheng
          24.
          Add documentation for DataTypes in Python DataStream API Sub-task Closed Hequn Cheng
          25.
          Update the Sphinx doc for Python DataStream API. Sub-task Closed Hequn Cheng
          26.
          Add end to end test for Python DataStream API Sub-task Closed Shuiqiang Chen
          27.
          Add ElasticSearch connector for Python DataStream API Sub-task Closed Luning Wang
          28.
          Add documentation for Operations in Python DataStream API. Sub-task Closed Shuiqiang Chen
          29.
          Add documentation for dependency management in Python DataStream API. Sub-task Closed Shuiqiang Chen
          30.
          Add documentation for connectors in Python DataStream API. Sub-task Closed Unassigned
          31.
          Add tutorial documentation for Python DataStream API Sub-task Closed Hequn Cheng
          32.
          Support Kinesis connector in Python DataStream API. Sub-task Closed pengmd

          Activity

            People

              Unassigned Unassigned
              hequn8128 Hequn Cheng
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: