Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-33795

gapply fails execution with rbind error

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Invalid
    • 3.0.0
    • None
    • SparkR
    • None
    • Databricks runtime 7.3 LTS ML

    Description

      Executing following code on databricks runtime 7.3 LTS ML errors out showing some rbind error whereas it is successfully executed without enabling Arrow in Spark session. Full error message attached.

       

      ```

      library(dplyr)
      library(SparkR)

      SparkR::sparkR.session(sparkConfig = list(spark.sql.execution.arrow.sparkr.enabled = "true"))

      mtcars %>%
      SparkR::as.DataFrame() %>%

      SparkR::gapply(x = .,
      cols = c("cyl", "vs"),

      func = function(key,
      data)

      { dt <- data[,c("mpg", "qsec")] res <- apply(dt, 2, mean) df <- data.frame(firstGroupKey = key[1], secondGroupKey = key[2], mean_mpg = res[1], mean_cyl = res[2]) return(df) }

      ,
      schema = structType(structField("cyl", "double"),
      structField("vs", "double"),
      structField("mpg_mean", "double"),
      structField("qsec_mean", "double"))
      ) %>%
      display()

      ```

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            n8shdw MvR
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment