Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Incomplete
-
2.4.5
-
None
Description
Dates and other columns are excluded:
from datetime import datetime, timedelta, timezone
from pyspark.sql import types as T
from pyspark.sql import Row
from pyspark.sql import functions as FSTART = datetime(2014, 1, 1, tzinfo=timezone.utc)n_days = 22date_range = [Row(date=(START + timedelta(days=n))) for n in range(0, n_days)]schema = T.StructType([T.StructField(name="date", dataType=T.DateType(), nullable=False)])
rdd = spark.sparkContext.parallelize(date_range)df = spark.createDataFrame(data=rdd, schema=schema)
df.agg(F.max("date")).show()df.summary().show()
-------
|summary|
-------
| count |
| mean |
| stddev|
| min |
| 25% |
| 50% |
| 75% |
| max |
-------
Attachments
Issue Links
- links to