Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-31517

SparkR::orderBy with multiple columns descending produces error

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.4.5
    • 3.1.0, 3.2.0
    • SparkR
    • None
    • Databricks Runtime 6.5

    Description

      When specifying two columns within an `orderBy()` function, to attempt to get an ordering by two columns in descending order, an error is returned.

      library(magrittr) 
      library(SparkR) 
      cars <- cbind(model = rownames(mtcars), mtcars) 
      carsDF <- createDataFrame(cars) 
      
      carsDF %>% 
        mutate(rank = over(rank(), orderBy(windowPartitionBy(column("cyl")), desc(column("mpg")), desc(column("disp"))))) %>% 
        head() 

      This returns an error:

       Error in ns[[i]] : subscript out of bounds

      This seems to be related to the more general issue that the following code, excluding the use of the `desc()` function also fails:

      carsDF %>% 
        mutate(rank = over(rank(), orderBy(windowPartitionBy(column("cyl")), column("mpg"), column("disp")))) %>% 
        head()

       

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            michaelchirico Michael Chirico
            rossbowen Ross Bowen
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment