Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-19037

Run count(distinct x) from sub query found some errors

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 2.1.0
    • Fix Version/s: None
    • Component/s: Spark Shell, SQL
    • Environment:

      spark 2.1.0, scala 2.11

      Description

      when i use spark-shell or spark-sql to execute count(distinct name) from subquery, some errors occur:

      select count(distinct name) from (select * from mytest limit 10) as a

      if i do this in hive-server2, i can get the correct result.

      if i just execute select count(name) from (select * from mytest limit 10) as a, i can also get the right result.

      besides, i found the same errors when i use distinct(),groupby() with subquery.

      I think there maybe some bugs when doing key-reduce jobs with subquery.

      I will add the errors in new comment.

      besides, i test dropDuplicates in spark-shell:

      1. spark.sql("select * from mytest limit 10").dropDuplicates("name").show

      it will throw some exceptions

      2. spark.table("mytest").dropDuplicates("name").show

      it will return the right result

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                snodawn snodawn
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: