Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-46948

Support higher dimensional array return values in predict_batch_udf beyond 1 or 2 dimensional arrays

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsAdd voteVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete CommentsDelete
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 3.5.0
    • None
    • ML
    • None

    Description

      pyspark.ml.functions.predict_batch_udf does not support return types with more than 2 dimensions:

      https://github.com/apache/spark/pull/37734#discussion_r1016156053

      Many computer vision models return ndarrays with 3 or 4 dimensions. Segmentation returns 3 dimensions: [Category, H, W]and if there is a time dimension, that's the fourth dimension.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned Assign to me
            rbavery Ryan Avery

            Dates

              Created:
              Updated:

              Slack

                Issue deployment