[SPARK-11949] Query on DataFrame from cube gives wrong results - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.5.1
Fix Version/s: 1.6.0
Component/s: SQL
Labels:

Description

Reproduce bug

case class fact(date: Int, hour: Int, minute: Int, room_name: String, temp: Double)
val df0 = sc.parallelize(Seq
(
fact(20151123, 18, 35, "room1", 18.6),
fact(20151123, 18, 35, "room2", 22.4),
fact(20151123, 18, 36, "room1", 17.4),
fact(20151123, 18, 36, "room2", 25.6)
)).toDF()
val cube0 = df0.cube("date", "hour", "minute", "room_name").agg(Map
(
"temp" -> "avg"
))
cube0.where("date IS NULL").show()

The query result is empty. It should not be, because cube0 contains the value null several times in column 'date'. The issue arises because the cube function reuses the schema information from df0. If I change the type of parameters in the case class to Option[T] the query gives correct results.

Solution: The cube function should change the schema by changing the nullable property to true, for the columns (dimensions) specified in the method call parameters.

I am new at Scala and Spark. I don't know how to implement this. Somebody please do.

Attachments

Issue Links

links to

[Github] Pull Request #10038 (viirya)

[Github] Pull Request #10067 (viirya)

Activity

People

Assignee:: L. C. Hsieh

Reporter:: Veli Kerim Celik

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 24/Nov/15 11:41

Updated:: 12/Dec/22 18:10

Resolved:: 01/Dec/15 15:44