Description
Hello all,
I ran into a use case in project with spark sql and want to share with you some thoughts about the function array_contains.
Say I have a Dataframe containing 2 columns. Column A of type "Array of String" and Column B of type "String". I want to determine if the value of column B is contained in the value of column A, without using a udf of course.
The function array_contains came into my mind naturally:
def array_contains(column: Column, value: Any): Column = withExpr
However the function takes the column B and does a "Literal" of column B, which yields a runtime exception: RuntimeException("Unsupported literal type " + v.getClass + " " + v).
Then after discussion with my friends, we fund a solution without using udf:
new Column(ArrayContains(col("ColumnA").expr, col("ColumnB").expr)
With this solution, I think of empowering a little bit more the function, by doing like this:
def array_contains(column: Column, value: Any): Column = withExpr {
value match
}
It does a pattern matching to detect if value is of type Column. If yes, it will use the .expr of the column, otherwise it will work as it used to.