[SPARK-41298] Getting Count on data frame is giving the performance issue - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 2.4.4
Fix Version/s: None
Component/s: Spark Core
Labels:
None

Description

We are invoking below query on Teradata

1) Dataframe<Row> df = spark.format("jdbc"). . . load();

2) int count = df.count();

When we executed the df.count spark internally issuing the below query on teradata which is wasting the lot of CPU on teradata and DBAs are making noise by seeing this query.

Query : SELECT 1 FROM (<ONE_MILLION_ROWS_TABLE>)SPARK_SUB_TAB

Response:

Is this expected behavior from spark or is it bug.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Ramakrishna

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 28/Nov/22 12:38

Updated:: 07/Dec/22 06:53