when i use spark-shell or spark-sql to execute count(distinct name) from subquery, some errors occur:
select count(distinct name) from (select * from mytest limit 10) as a
if i do this in hive-server2, i can get the correct result.
if i just execute select count(name) from (select * from mytest limit 10) as a, i can also get the right result.
besides, i found the same errors when i use distinct(),groupby() with subquery.
I think there maybe some bugs when doing key-reduce jobs with subquery.
I will add the errors in new comment.
besides, i test dropDuplicates in spark-shell:
1. spark.sql("select * from mytest limit 10").dropDuplicates("name").show
it will throw some exceptions
it will return the right result