Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
2.1.0
-
None
-
spark 2.1.0, scala 2.11
Description
when i use spark-shell or spark-sql to execute count(distinct name) from subquery, some errors occur:
select count(distinct name) from (select * from mytest limit 10) as a
if i do this in hive-server2, i can get the correct result.
if i just execute select count(name) from (select * from mytest limit 10) as a, i can also get the right result.
besides, i found the same errors when i use distinct(),groupby() with subquery.
I think there maybe some bugs when doing key-reduce jobs with subquery.
I will add the errors in new comment.
besides, i test dropDuplicates in spark-shell:
1. spark.sql("select * from mytest limit 10").dropDuplicates("name").show
it will throw some exceptions
2. spark.table("mytest").dropDuplicates("name").show
it will return the right result
Attachments
Issue Links
- duplicates
-
SPARK-18528 limit + groupBy leads to java.lang.NullPointerException
- Resolved