-
Type:
Bug
-
Status: Resolved
-
Priority:
Major
-
Resolution: Duplicate
-
Affects Version/s: 2.1.0
-
Fix Version/s: None
-
Component/s: Spark Shell, SQL
-
Environment:
spark 2.1.0, scala 2.11
when i use spark-shell or spark-sql to execute count(distinct name) from subquery, some errors occur:
select count(distinct name) from (select * from mytest limit 10) as a
if i do this in hive-server2, i can get the correct result.
if i just execute select count(name) from (select * from mytest limit 10) as a, i can also get the right result.
besides, i found the same errors when i use distinct(),groupby() with subquery.
I think there maybe some bugs when doing key-reduce jobs with subquery.
I will add the errors in new comment.
besides, i test dropDuplicates in spark-shell:
1. spark.sql("select * from mytest limit 10").dropDuplicates("name").show
it will throw some exceptions
2. spark.table("mytest").dropDuplicates("name").show
it will return the right result
- duplicates
-
SPARK-18528 limit + groupBy leads to java.lang.NullPointerException
-
- Resolved
-