Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-19037

Run count(distinct x) from sub query found some errors

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 2.1.0
    • None
    • Spark Shell, SQL
    • spark 2.1.0, scala 2.11

    Description

      when i use spark-shell or spark-sql to execute count(distinct name) from subquery, some errors occur:

      select count(distinct name) from (select * from mytest limit 10) as a

      if i do this in hive-server2, i can get the correct result.

      if i just execute select count(name) from (select * from mytest limit 10) as a, i can also get the right result.

      besides, i found the same errors when i use distinct(),groupby() with subquery.

      I think there maybe some bugs when doing key-reduce jobs with subquery.

      I will add the errors in new comment.

      besides, i test dropDuplicates in spark-shell:

      1. spark.sql("select * from mytest limit 10").dropDuplicates("name").show

      it will throw some exceptions

      2. spark.table("mytest").dropDuplicates("name").show

      it will return the right result

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              snodawn snodawn
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: