Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-33795

gapply fails execution with rbind error

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Invalid
    • 3.0.0
    • None
    • SparkR
    • None
    • Databricks runtime 7.3 LTS ML

    Description

      Executing following code on databricks runtime 7.3 LTS ML errors out showing some rbind error whereas it is successfully executed without enabling Arrow in Spark session. Full error message attached.

       

      ```

      library(dplyr)
      library(SparkR)

      SparkR::sparkR.session(sparkConfig = list(spark.sql.execution.arrow.sparkr.enabled = "true"))

      mtcars %>%
      SparkR::as.DataFrame() %>%

      SparkR::gapply(x = .,
      cols = c("cyl", "vs"),

      func = function(key,
      data)

      { dt <- data[,c("mpg", "qsec")] res <- apply(dt, 2, mean) df <- data.frame(firstGroupKey = key[1], secondGroupKey = key[2], mean_mpg = res[1], mean_cyl = res[2]) return(df) }

      ,
      schema = structType(structField("cyl", "double"),
      structField("vs", "double"),
      structField("mpg_mean", "double"),
      structField("qsec_mean", "double"))
      ) %>%
      display()

      ```

      Attachments

        1. Rerror.log
          9 kB
          MvR

        Activity

          People

            Unassigned Unassigned
            n8shdw MvR
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: