[SPARK-38363] Avoid runtime error in Dataset.summary() when ANSI mode is on - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.3.0
Fix Version/s: 3.3.0, 3.2.2
Component/s: SQL
Labels:
None

Description

When executing df.summary(), Spark SQL converts String columns as Double for the

percentiles/mean/stddev metrics.

This can cause runtime errors with ANSI mode on.

Since this API is for getting a quick summary of the Dataframe, I suggest using "TryCast" for the problematic stats so that the API still works under ANSI mode.

Attachments

Issue Links

links to

[Github] Pull Request #35699 (gengliangwang)

Activity

People

Assignee:: Gengliang Wang

Reporter:: Gengliang Wang

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 01/Mar/22 13:54

Updated:: 02/Mar/22 02:54

Resolved:: 02/Mar/22 02:54