Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.1.2
Description
Hi, we just upgraded from 3.0.2 to 3.1.2 and noticed a silent behavior change that a) seems incorrect, and b) is undocumented in the migration guide:
3.0.2
scala> val df = spark.sql("SELECT '' AS col") df: org.apache.spark.sql.DataFrame = [col: string] scala> df.withColumn("replaced", regexp_replace(col("col"), "^$", "<empty>")).show +---+--------+ |col|replaced| +---+--------+ | | <empty>| +---+--------+
3.1.2
scala> val df = spark.sql("SELECT '' AS col") df: org.apache.spark.sql.DataFrame = [col: string] scala> df.withColumn("replaced", regexp_replace(col("col"), "^$", "<empty>")).show +---+--------+ |col|replaced| +---+--------+ | | | +---+--------+
Note, the regular expression ^$ should match the empty string, but doesn't in version 3.1. E.g. this is the Java behavior:
scala> "".replaceAll("^$", "<empty>"); res1: String = <empty>