Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-10690

SQL select count(distinct ) won't work for a normal load

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • 1.4.1
    • None
    • SQL
    • None

    Description

      I have about 200million email records. Spark 1.4.1 running with
      --num-executors=5 --executor-cores=3
      --executor-memory=20g

      The set up more than enough for doing a distinct and count but when I do
      select count(distinct email) from theTable
      I always got the error that some executors are lost.
      If I do
      select count from (select distinct email from theTable) tmp
      then it works.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              dzhao11 Dave
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: