Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-41298

Getting Count on data frame is giving the performance issue

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.4.4
    • None
    • Spark Core
    • None

    Description

      We are invoking  below query on Teradata 

      1) Dataframe<Row> df = spark.format("jdbc"). . . load();

      2) int count = df.count();

      When we executed the df.count spark internally issuing the below query on teradata which is wasting the lot of CPU on teradata and DBAs are making noise by seeing this query.

       

      Query : SELECT 1 FROM (<ONE_MILLION_ROWS_TABLE>)SPARK_SUB_TAB

      Response:

      1

      1

      1

      1

      1

      ..

      1

       

      Is this expected behavior from spark or is it bug.

      Attachments

        Activity

          People

            Unassigned Unassigned
            ramkychowdary0560 Ramakrishna
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: