Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-12479

sparkR collect on GroupedData throws R error "missing value where TRUE/FALSE needed"

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.5.1
    • 2.0.0
    • SparkR
    • None

    Description

      sparkR collect on GroupedData throws "missing value where TRUE/FALSE needed"

      Spark Version: 1.5.1
      R Version: 3.2.2

      I tracked down the root cause of this exception to an specific key for which the hashCode could not be calculated.

      The following code recreates the problem when ran in sparkR:

      hashCode <- getFromNamespace("hashCode","SparkR")
      hashCode("bc53d3605e8a5b7de1e8e271c2317645")
      Error in if (value > .Machine$integer.max) { :
      missing value where TRUE/FALSE needed

      I went one step further and relaised the the problem happens because of the bit wise shift below returning NA.

      bitwShiftL(-1073741824,1)

      where bitwShiftL is an R function.
      I believe the bitwShiftL function is working as it is supposed to. Therefore, this PR fixes it in the SparkR package: https://github.com/apache/spark/pull/10436
      .

      Attachments

        Issue Links

          Activity

            People

              sunrui Sun Rui
              paulo.magalhaes Paulo Magalhaes
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: