Details
-
Sub-task
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
It's important to call hive udf in Flink SQL. A great many udfs were written in hive since last ten years.
It's really important to reuse the hive udfs. This feature will reduce the cost of migration and bring more users to flink.
Spark SQL has already supported this function.
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.0/bk_spark-guide/content/calling-udfs.html
The Hive UDFs here include both built-in UDFs and customized UDFs. As many business logic had been written in UDFs, the customized UDFs are more important than the built-in UDFs.
Generally, there are three kinds of UDFs in Hive: UDF, UDTF and UDAF.
Here is the document of the Spark SQL: http://spark.apache.org/docs/latest/sql-programming-guide.html#compatibility-with-apache-hive
Spark code:
https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala
https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala