[SPARK-24440] When use constant as column we may get wrong answer versus impala - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Incomplete
Affects Version/s: 2.1.0, 2.3.0
Fix Version/s: None
Component/s: SQL
Labels:
- bulk-closed

Description

For query below:

select `date`, 100 as platform, count(distinct deviceid) as new_user from tv.clean_new_user where `date`=20180528 group by `date`, platform

We intended to group by 100 and get distinct deviceid number.
By spark sql,we get:

+-----------+-----------+-----------+--+
|   date    | platform  | new_user  |
+-----------+-----------+-----------+--+
| 20180528  | 100       | 521       |
| 20180528  | 100       | 82        |
| 20180528  | 100       | 3         |
| 20180528  | 100       | 2         |
| 20180528  | 100       | 7         |
| 20180528  | 100       | 870       |
| 20180528  | 100       | 3         |
| 20180528  | 100       | 8         |
| 20180528  | 100       | 3         |
| 20180528  | 100       | 2204      |
| 20180528  | 100       | 1123      |
| 20180528  | 100       | 1         |
| 20180528  | 100       | 54        |
| 20180528  | 100       | 440       |
| 20180528  | 100       | 4         |
| 20180528  | 100       | 478       |
| 20180528  | 100       | 34        |
| 20180528  | 100       | 195       |
| 20180528  | 100       | 17        |
| 20180528  | 100       | 18        |
| 20180528  | 100       | 2         |
| 20180528  | 100       | 2         |
| 20180528  | 100       | 84        |
| 20180528  | 100       | 1616      |
| 20180528  | 100       | 15        |
| 20180528  | 100       | 7         |
| 20180528  | 100       | 479       |
| 20180528  | 100       | 50        |
| 20180528  | 100       | 376       |
| 20180528  | 100       | 21        |
| 20180528  | 100       | 842       |
| 20180528  | 100       | 444       |
| 20180528  | 100       | 538       |
| 20180528  | 100       | 1         |
| 20180528  | 100       | 2         |
| 20180528  | 100       | 7         |
| 20180528  | 100       | 17        |
| 20180528  | 100       | 133       |
| 20180528  | 100       | 7         |
| 20180528  | 100       | 415       |
| 20180528  | 100       | 2         |
| 20180528  | 100       | 318       |
| 20180528  | 100       | 5         |
| 20180528  | 100       | 1         |
| 20180528  | 100       | 2060      |
| 20180528  | 100       | 1217      |
| 20180528  | 100       | 2         |
| 20180528  | 100       | 60        |
| 20180528  | 100       | 22        |
| 20180528  | 100       | 4         |
+-----------+-----------+-----------+--+

Actually sum of the deviceid is below:

0: jdbc:hive2://xxx/> select sum(t1.new_user) from (select `date`, 100 as platform, count(distinct deviceid) as new_user from tv.clean_new_user where `date`=20180528 group by `date`, platform)t1; 
+----------------+--+
| sum(new_user)  |
+----------------+--+
| 14816          |
+----------------+--+
1 row selected (4.934 seconds)

And the real distinct deviceid value is below:

0: jdbc:hive2://xxx/> select 100 as platform, count(distinct deviceid) as new_user from tv.clean_new_user where `date`=20180528;
+-----------+-----------+--+
| platform  | new_user  |
+-----------+-----------+--+
| 100       | 14773     |
+-----------+-----------+--+
1 row selected (2.846 seconds)

In impala,with the first query we can get result below:

[xxx] > select `date`, 100 as platform, count(distinct deviceid) as new_user from tv.clean_new_user where `date`=20180528 group by `date`, platform;Query: select `date`, 100 as platform, count(distinct deviceid) as new_user from tv.clean_new_user where `date`=20180528 group by `date`, platform
+----------+----------+----------+
| date     | platform | new_user |
+----------+----------+----------+
| 20180528 | 100      | 14773    |
+----------+----------+----------+
Fetched 1 row(s) in 1.00s

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: zhoukang

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 31/May/18 10:28

Updated:: 08/Oct/19 05:44

Resolved:: 08/Oct/19 05:44