This is an umbrella JIRA tracking all enhancement and issues related to integrating Flink with Hive ecosystem. This is an outcome of a discussion in the community, and thanks go to everyone that provided feedback and interest.
Specifically, we'd like to see the following features and capabilities immediately in Flink:
- Metadata interoperability
- Data interoperability
- Data type compatibility
- Hive UDF support
- DDL/DML/Query language compatibility
For a longer term, we'd also like to add or improve:
- Compatible SQL service, client tools, JDBC/ODBC drivers
- Better task failure tolerance and task scheduling
- Support other user customizations in Hive (storage handlers, serdes, etc).
I will provide more details regarding the proposal in a doc shortly. Design doc, if deemed necessary, will be provided in each related sub tasks under this JIRA.
Feedback and contributions are greatly welcome!
|Flink SQL calling Hive User-Defined Functions||Open|
|Support for simple hive UDF||Open|
|Support for Hive GenericUDF||In Progress||
|Support for Hive User-defined Table-generating Function (UDTF)||Open|
|Support for Hive's User-defined Aggregation Function (UDAF)||Open|
|Add user documentation for Flink-Hive integration||Open||Unassigned|