Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-27581

DataFrame countDistinct("*") fails with AnalysisException: "Invalid usage of '*' in expression 'count'"

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.4.0
    • Fix Version/s: 3.0.0
    • Component/s: SQL
    • Labels:
      None

      Description

      If I have a DataFrame then I can use count("*") as an expression, e.g.:

      import org.apache.spark.sql.functions._
      val df = sql("select id % 100 from range(100000)")
      df.select(count("*")).first()
      

      However, if I try to do the same thing with countDistinct I get an error:

      import org.apache.spark.sql.functions._
      val df = sql("select id % 100 from range(100000)")
      df.select(countDistinct("*")).first()
      
      org.apache.spark.sql.AnalysisException: Invalid usage of '*' in expression 'count';
      

      As a workaround, I need to use expr, e.g.

      import org.apache.spark.sql.functions._
      val df = sql("select id % 100 from range(100000)")
      df.select(expr("count(distinct(*))")).first()
      

      You might be wondering "why not just use df.count() or df.distinct().count()?" but in my case I'd ultimately to compute both counts as part of the same aggregation, e.g.

      val (cnt, distinctCnt) = df.select(count("*"), countDistinct("*)).as[(Long, Long)].first()
      

      I'm reporting this because it's a minor usability annoyance / surprise for inexperienced Spark users.

        Attachments

          Activity

            People

            • Assignee:
              viirya L. C. Hsieh
              Reporter:
              joshrosen Josh Rosen

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment