[SPARK-25462] hive on spark - got a weird output when count(*) from this script - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Question
Status: Resolved
Priority: Major
Resolution: Invalid
Affects Version/s: 1.6.2
Fix Version/s: None
Component/s: SQL
Labels:
None
Environment:

spark 1.6.2

hive 1.2.2

hadoop 2.7.1

Description

use hiveContext to exec a script below:

with nt as (select label, score from (select * from (select label, score, row_number() over (order by score desc) as position from t1)t_1 join (select count as countall from t1)t_2 )ta where position <= countall * 0.4) select count as c_positive from nt where label = 1

and i got this result.

it is weird when call the 'count()' func on rdd and dataframe,

as the pic says: different output here....

can someone help me out? thanks a lot!!!!

PS: the parquet file i used is the 'test.gz.parquet' in Attachments.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

jira.png
19/Sep/18 06:11
871 kB
Gu Yuchen
test.gz.parquet
19/Sep/18 06:19
1 kB
Gu Yuchen

Activity

People

Assignee:: Unassigned

Reporter:: Gu Yuchen

Shepherd:: Jeremy

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 19/Sep/18 06:17

Updated:: 19/Sep/18 11:06

Resolved:: 19/Sep/18 11:05