[SPARK-4867] UDF clean up - ASF JIRA

XML

Word

Printable

JSON

Right now our support and internal implementation of many functions has a few issues. Specifically:

UDFS don't know their input types and thus don't do type coercion.
We hard code a bunch of built in functions into the parser. This is bad because in SQL it creates new reserved words for things that aren't actually keywords. Also it means that for each function we need to add support to both SQLContext and HiveContext separately.

For this JIRA I propose we do the following:

Change the interfaces for registerFunction and ScalaUdf to include types for the input arguments as well as the output type.
Add a rule to analysis that does type coercion for UDFs.
Add a parse rule for functions to SQLParser.
Rewrite all the UDFs that are currently hacked into the various parsers using this new functionality.

Depending on how big this refactoring becomes we could split parts 1&2 from part 3 above.

blocks

SPARK-4559 Adding support for ucase and lcase

SPARK-5215 concat support in sqlcontext

SPARK-2686 Add Length support to Spark SQL and HQL and Strlen support to SQL

SPARK-4151 Add string operation function trim, ltrim, rtrim, length to support SparkSql (HiveQL)

is related to

SPARK-7886 Add built-in expressions to FunctionRegistry

relates to

SPARK-2863 Emulate Hive type coercion in native reimplementations of Hive functions

(1 relates to)