[SPARK-12479] sparkR collect on GroupedData throws R error "missing value where TRUE/FALSE needed" - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.5.1
Fix Version/s: 2.0.0
Component/s: SparkR
Labels:
None

Description

sparkR collect on GroupedData throws "missing value where TRUE/FALSE needed"

Spark Version: 1.5.1
R Version: 3.2.2

I tracked down the root cause of this exception to an specific key for which the hashCode could not be calculated.

The following code recreates the problem when ran in sparkR:

hashCode <- getFromNamespace("hashCode","SparkR")
hashCode("bc53d3605e8a5b7de1e8e271c2317645")
Error in if (value > .Machine$integer.max) { :
missing value where TRUE/FALSE needed

I went one step further and relaised the the problem happens because of the bit wise shift below returning NA.

bitwShiftL(-1073741824,1)

where bitwShiftL is an R function.
I believe the bitwShiftL function is working as it is supposed to. Therefore, this PR fixes it in the SparkR package: https://github.com/apache/spark/pull/10436
.

Attachments

Issue Links

relates to

SPARK-15201 Handle integer overflow correctly in hash code computation

Resolved

links to

[Github] Pull Request #10436 (paulomagalhaes)

[Github] Pull Request #12976 (sun-rui)

Activity

People

Assignee:: Sun Rui

Reporter:: Paulo Magalhaes

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 22/Dec/15 15:20

Updated:: 08/May/16 07:18

Resolved:: 08/May/16 07:17