Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-18277

na.fill() and friends should work on struct fields

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Incomplete
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: SQL
    • Labels:

      Description

      It appears that you cannot use fill() and friends to quickly modify struct fields.

      For example:

      >>> df = spark.createDataFrame([Row(a=Row(b='yeah yeah'), c='alright'), Row(a=Row(b=None), c=None)])
      >>> df.printSchema()
      root
       |-- a: struct (nullable = true)
       |    |-- b: string (nullable = true)
       |-- c: string (nullable = true)
      
      >>> df.show()
      +-----------+-------+
      |          a|      c|
      +-----------+-------+
      |[yeah yeah]|alright|
      |     [null]|   null|
      +-----------+-------+
      
      >>> df.na.fill('').show()
      +-----------+-------+
      |          a|      c|
      +-----------+-------+
      |[yeah yeah]|alright|
      |     [null]|       |
      +-----------+-------+
      

      c got filled in, but a.b didn't.

      I don't know if it's "appropriate", but it would be nice if fill() and friends worked automatically on struct fields.

      As things are today, there doesn't appear to be a way to fill in null values inside structs. If you try when(), you realize that you cannot do when(col('a.b') is None, '') because Column doesn't implement the appropriate protocol for is. And if you try when(col('a.b') == None, '') it doesn't catch the null values.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                nchammas Nicholas Chammas
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: