Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.5.1
Description
I have a use case to split a String typed column with different delimiters defined in other columns of the dataframe. SQL already supports this, but scala / python functions currently don't.
A hypothetical example to illustrate:
import org.apache.spark.sql.functions.{col, split} val example = spark.createDataFrame( Seq( ("Doe, John", ", ", 2), ("Smith,Jane", ",", 2), ("Johnson", ",", 1) ) ) .toDF("name", "delim", "expected_parts_count") example.createOrReplaceTempView("test_data") // works for SQL spark.sql("SELECT split(name, delim, expected_parts_count) AS name_parts FROM test_data").show() // currently doesn't compile for scala, but easy to support example.withColumn("name_parts", split(col("name"), col("delim"), col("expected_parts_count"))).show()
Pretty simple patch that I can make a PR soon
Attachments
Issue Links
- links to