Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-22283

withColumn should replace multiple instances with a single one

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Not A Problem
    • 2.2.0
    • None
    • Spark Core
    • None

    Description

      Currently, withColumn claims to do the following: "adding a column or replacing the existing column that has the same name."

      Unfortunately, if multiple existing columns have the same name (which is a normal occurrence after a join), this results in multiple replaced – and retained –
      columns (with the same value), and messages about an ambiguous column.

      The current implementation of withColumn contains this:

        def withColumn(colName: String, col: Column): DataFrame = {
          val resolver = sparkSession.sessionState.analyzer.resolver
          val output = queryExecution.analyzed.output
          val shouldReplace = output.exists(f => resolver(f.name, colName))
          if (shouldReplace) {
            val columns = output.map { field =>
              if (resolver(field.name, colName)) {
                col.as(colName)
              } else {
                Column(field)
              }
            }
            select(columns : _*)
          } else {
            select(Column("*"), col.as(colName))
          }
        }
      

      Instead, suggest something like this (which replaces all matching fields with a single instance of the new one):

        def withColumn(colName: String, col: Column): DataFrame = {
          val resolver = sparkSession.sessionState.analyzer.resolver
          val output = queryExecution.analyzed.output
          val existing = output.filterNot(f => resolver(f.name, colName)).map(new Column(_))
          select(existing :+ col.as(colName): _*)
        }
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            kitbellew Albert Meltzer
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: