Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-37822

SQL function `split` should return an array of non-nullable elements

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.3.0
    • 3.3.0
    • SQL
    • None

    Description

      Currently, split returns the data type ArrayType(StringType) which means the resultant array can contain nullable elements. However I do not see any case where the array can contain nulls.

      In the case where either the provided string or delimiter is NULL, the output will be a NULL array. In case of empty string or no chars between delemiters, the output array will contain empty strings but never NULLs. So I propose we change the return type of split to mark elements as non-null.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            shardulm Shardul Mahadik
            shardulm Shardul Mahadik
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment