Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-36642

Add df.withMetadata: a syntax suger to update the metadata of a dataframe

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Story
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.3.0
    • 3.3.0
    • SQL
    • None

    Description

      To make it easy to use/modify the semantic annotation, we want to have a shorter API to update the metadata in a dataframe.

      Currently we have

      df.withColumn("col1", col("col1").alias("col1", metadata=metadata))
      

      to update the metadata without changing the column name, and this is too verbose. We want to have a syntax suger API

      df.withMetadata("col1", metadata=metadata)
      

      to achieve the same functionality.

      A bit of background for the frequency of the metadata update: We are working on inferring the semantic data types and use them in AutoML and store the semantic annotation in the metadata. So in many cases, we will suggest the user update the metadata to correct the wrong inference or manually add the annotation for weak inference.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            liangz Liang Zhang
            liangz Liang Zhang
            hyukjin.kwon hyukjin.kwon
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment