Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-39832

regexp_replace should support column arguments

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.3.0
    • 3.4.0
    • PySpark

    Description

      F.regexp_replace in PySpark currently only supports strings for the second and third argument: https://github.com/apache/spark/blob/1df6006ea977ae3b8c53fe33630e277e8c1bc49c/python/pyspark/sql/functions.py#L3265

      In Scala, columns are also supported: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L2836

      The desire to use columns as arguments for the function has been raised previously on StackExchange: https://stackoverflow.com/questions/64613761/in-pyspark-using-regexp-replace-how-to-replace-a-group-with-value-from-another, where the suggested fix was to use F.expr.

      It should be relatively straightforward to support in PySpark the two function signatures supported in Scala.

      Attachments

        Activity

          People

            physinet Brian Schaefer
            physinet Brian Schaefer
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: