Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-26901

Vectorized gapply should not prune columns

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0
    • 3.0.0
    • R, SQL
    • None

    Description

      Currently, if some columns can be pushed, it's being pushed through FlatMapGroupsInRWithArrow.

      explain(count(gapply(df,
                           "gear",
                           function(key, group) {
                             data.frame(gear = key[[1]], disp = mean(group$disp))
                           },
                           structType("gear double, disp double"))), TRUE)
      
      *(4) HashAggregate(keys=[], functions=[count(1)], output=[count#64L])
      +- Exchange SinglePartition
         +- *(3) HashAggregate(keys=[], functions=[partial_count(1)], output=[count#67L])
            +- *(3) Project
               +- FlatMapGroupsInRWithArrow [...]
                  +- *(2) Sort [gear#9 ASC NULLS FIRST], false, 0
                     +- Exchange hashpartitioning(gear#9, 200)
                        +- *(1) Project [gear#9]
                           +- *(1) Scan ExistingRDD arrow[mpg#0,cyl#1,disp#2,hp#3,drat#4,wt#5,qsec#6,vs#7,am#8,gear#9,carb#10]
      

      This causes to send corrupt values R workers when the R native functions are executed.

        c(5, 5, 5, 5, 5)
        c(7.90505033345994e-323, 7.90505033345994e-323, 7.90505033345994e-323, 7.90505033345994e-323, 2.47032822920623e-323)
        c(0, 0, 0, 0, 2.05578399548861e-314)
        c(3.4483079184909e-313, 3.4483079184909e-313, 3.4483079184909e-313, 5.31146529464635e-315, 0)
        c(0, 0, 0, 0, -2.63230705887168e+228)
        c(5, 5, 5, 0, 2.47032822920623e-323)
        c(7.90505033345994e-323, 7.90505033345994e-323, 0, 0, 4.17777978645388e-314)
        c(0, 0, 0, 0, -2.18328530492023e+219)
        c(3.4483079184909e-313, 5.31146529464635e-315, 0, 0, -2.63230127529109e+228)
        c(0, 0, 0, 0, 2.47032822920623e-323)
        c(5, 0, 0, 0, 4.17777978645388e-314)
        c(3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3)
        c(7.90505033345994e-323, 7.90505033345994e-323, 7.90505033345994e-323, 7.90505033345994e-323, 7.90505033345994e-323, 7.90505033345994e-323, 7.90505033345994e-323, 7.90505033345994e-323, 7.90505033345994e-323, 7.90505033345994e-323, 7.90505033345994e-323, 7.90505033345994e-323, 7.90505033345994e-323, 7.90505033345994e-323, 2.47032822920623e-323)
        c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2.05578399548861e-314)
        c(3.4482690635875e-313, 3.4482690635875e-313, 3.4482690635875e-313, 3.4482690635875e-313, 3.4482690635875e-313, 3.4482690635875e-313, 3.4482690635875e-313, 3.4482690635875e-313, 3.4482690635875e-313, 3.4482690635875e-313, 3.4482690635875e-313, 3.4482690635875e-313, 3.4482690635875e-313, 5.30757980430645e-315, 0)
        c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -2.73302088532611e+228)
        c(3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 0, 2.47032822920623e-323)
        c(7.90505033345994e-323, 7.90505033345994e-323, 7.90505033345994e-323, 7.90505033345994e-323, 7.90505033345994e-323, 7.90505033345994e-323, 7.90505033345994e-323, 7.90505033345994e-323, 7.90505033345994e-323, 7.90505033345994e-323, 7.90505033345994e-323, 7.90505033345994e-323, 0, 0, 4.17777978645388e-314)
        c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1.04669129845114e+219)
        c(3.4482690635875e-313, 3.4482690635875e-313, 3.4482690635875e-313, 3.4482690635875e-313, 3.4482690635875e-313, 3.4482690635875e-313, 3.4482690635875e-313, 3.4482690635875e-313, 3.4482690635875e-313, 3.4482690635875e-313, 3.4482690635875e-313, 5.30757980430645e-315, 0, 0, -2.73301510174552e+228)
        c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2.47032822920623e-323)
        c(3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 0, 0, 0, 4.17777978645388e-314)
      

      which should be:

      
        c(21, 21, 22.8, 24.4, 22.8, 19.2, 17.8, 32.4, 30.4, 33.9, 27.3, 21.4)
        c(6, 6, 4, 4, 4, 6, 6, 4, 4, 4, 4, 4)
        c(160, 160, 108, 146.7, 140.8, 167.6, 167.6, 78.7, 75.7, 71.1, 79, 121)
        c(110, 110, 93, 62, 95, 123, 123, 66, 52, 65, 66, 109)
        c(3.9, 3.9, 3.85, 3.69, 3.92, 3.92, 3.92, 4.08, 4.93, 4.22, 4.08, 4.11)
        c(2.62, 2.875, 2.32, 3.19, 3.15, 3.44, 3.44, 2.2, 1.615, 1.835, 1.935, 2.78)
        c(16.46, 17.02, 18.61, 20, 22.9, 18.3, 18.9, 19.47, 18.52, 19.9, 18.9, 18.6)
        c(0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
        c(1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1)
        c(4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4)
        c(4, 4, 1, 2, 2, 4, 4, 1, 2, 1, 1, 2)
        c(26, 30.4, 15.8, 19.7, 15)
        c(4, 4, 8, 6, 8)
        c(120.3, 95.1, 351, 145, 301)
        c(91, 113, 264, 175, 335)
        c(4.43, 3.77, 4.22, 3.62, 3.54)
        c(2.14, 1.513, 3.17, 2.77, 3.57)
        c(16.7, 16.9, 14.5, 15.5, 14.6)
        c(0, 1, 0, 0, 0)
        c(1, 1, 1, 1, 1)
        c(5, 5, 5, 5, 5)
        c(2, 2, 4, 6, 8)
        c(21.4, 18.7, 18.1, 14.3, 16.4, 17.3, 15.2, 10.4, 10.4, 14.7, 21.5, 15.5, 15.2, 13.3, 19.2)
        c(6, 8, 6, 8, 8, 8, 8, 8, 8, 8, 4, 8, 8, 8, 8)
        c(258, 360, 225, 360, 275.8, 275.8, 275.8, 472, 460, 440, 120.1, 318, 304, 350, 400)
        c(110, 175, 105, 245, 180, 180, 180, 205, 215, 230, 97, 150, 150, 245, 175)
        c(3.08, 3.15, 2.76, 3.21, 3.07, 3.07, 3.07, 2.93, 3, 3.23, 3.7, 2.76, 3.15, 3.73, 3.08)
        c(3.215, 3.44, 3.46, 3.57, 4.07, 3.73, 3.78, 5.25, 5.424, 5.345, 2.465, 3.52, 3.435, 3.84, 3.845)
        c(19.44, 17.02, 20.22, 15.84, 17.4, 17.6, 18, 17.98, 17.82, 17.42, 20.01, 16.87, 17.3, 15.41, 17.05)
        c(1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0)
        c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
        c(3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3)
        c(1, 2, 1, 4, 3, 3, 3, 4, 4, 4, 1, 2, 2, 4, 2)
      

      Attachments

        Issue Links

          Activity

            People

              gurwls223 Hyukjin Kwon
              gurwls223 Hyukjin Kwon
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: