[SPARK-31735] Include all columns in the summary report - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Incomplete
Affects Version/s: 2.4.5
Fix Version/s: None
Component/s: Spark Core, SQL
Labels:
- bulk-closed

Description

Dates and other columns are excluded:

from datetime import datetime, timedelta, timezone
from pyspark.sql import types as T
from pyspark.sql import Row
from pyspark.sql import functions as FSTART = datetime(2014, 1, 1, tzinfo=timezone.utc)n_days = 22date_range = [Row(date=(START + timedelta(days=n))) for n in range(0, n_days)]schema = T.StructType([T.StructField(name="date", dataType=T.DateType(), nullable=False)])
rdd = spark.sparkContext.parallelize(date_range)df = spark.createDataFrame(data=rdd, schema=schema)
df.agg(F.max("date")).show()df.summary().show()
-------
|summary|
-------
| count |
| mean |
| stddev|
| min |
| 25% |
| 50% |
| 75% |
| max |
-------

Attachments

Issue Links

links to

[Github] Pull Request #28554 (Fokko)

Activity

People

Assignee:: Unassigned

Reporter:: Fokko Driesprong

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 16/May/20 17:30

Updated:: 25/May/21 01:49

Resolved:: 25/May/21 01:40