Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.5.0
    • Fix Version/s: None
    • Component/s: PySpark, SparkR, SQL
    • Labels:
      None

      Description

      I create a simple dataframe in R and call the summary function on it (standard R, not SparkR).

      > library(magrittr)
      > df <- data.frame(
        date = as.Date("2015-01-01") + 0:99, 
        r = runif(100)
      )
      > df %>% summary
            date                  r          
       Min.   :2015-01-01   Min.   :0.01221  
       1st Qu.:2015-01-25   1st Qu.:0.30003  
       Median :2015-02-19   Median :0.46416  
       Mean   :2015-02-19   Mean   :0.50350  
       3rd Qu.:2015-03-16   3rd Qu.:0.73361  
       Max.   :2015-04-10   Max.   :0.99618  
      
      

      Notice that the date can be summarised here. In SparkR; this will give an error.

      > ddf <- createDataFrame(sqlContext, df) 
      > ddf %>% summary
      Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...) :
        org.apache.spark.sql.AnalysisException: cannot resolve 'avg(date)' due to data type mismatch: function average requires numeric types, not DateType;
      	at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
      	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:61)
      	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:53)
      	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:293)
      	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:293)
      	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:51)
      	at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:292)
      	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:290)
      	at org.apache.spark.sql.
      

      This is a rather annoying bug since the SparkR documentation currently suggests that dates are now supported in SparkR.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              cantdutchthis Vincent Warmerdam
              Shepherd:
              Shivaram Venkataraman
            • Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated: