Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-15799 Release SparkR on CRAN
  3. SPARK-16519

Handle SparkR RDD generics that create warnings in R CMD check

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.0.1, 2.1.0
    • SparkR
    • None

    Description

      One of the warnings we get from R CMD check is that RDD implementations of some of the generics are not documented. These generics are shared between RDD, DataFrames in SparkR. The list includes

      WARNING
      Undocumented S4 methods:
      generic 'cache' and siglist 'RDD'
      generic 'collect' and siglist 'RDD'
      generic 'count' and siglist 'RDD'
      generic 'distinct' and siglist 'RDD'
      generic 'first' and siglist 'RDD'
      generic 'join' and siglist 'RDD,RDD'
      generic 'length' and siglist 'RDD'
      generic 'partitionBy' and siglist 'RDD'
      generic 'persist' and siglist 'RDD,character'
      generic 'repartition' and siglist 'RDD'
      generic 'show' and siglist 'RDD'
      generic 'take' and siglist 'RDD,numeric'
      generic 'unpersist' and siglist 'RDD'

      As described in https://stat.ethz.ch/pipermail/r-devel/2003-September/027490.html this looks like a limitation of R where exporting a generic from a package also exports all the implementations of that generic.

      One way to get around this is to remove the RDD API or rename the methods in Spark 2.1

      Attachments

        Activity

          People

            felixcheung Felix Cheung
            shivaram Shivaram Venkataraman
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: