Details
-
Bug
-
Status: Reopened
-
Major
-
Resolution: Unresolved
-
2.3.0
-
None
-
None
-
Spark 2.3.x
Description
ISO8601 allows to omit minutes, seconds and milliseconds.
hh:mm:ss.sss or hhmmss.sss hh:mm:ss or hhmmss hh:mm or hhmm hh
Either the seconds, or the minutes and seconds, may be omitted from the basic or extended time formats for greater brevity but decreased accuracy: [hh]:[mm], [hh][mm] and [hh] are the resulting reduced accuracy time formats
Source: Wikipedia ISO8601
Popular libs, such as ZonedDateTime, respect that. However, Timestamp cast fails silently.
import org.apache.spark.sql.types._ val df1 = Seq(("2017-08-01T02:33")).toDF("eventTimeString") // NON-ISO8601 (missing TZ offset) [OK] val new_df1 = df1 .withColumn("eventTimeTS", col("eventTimeString").cast(TimestampType)) new_df1.show(false) +----------------+-------------------+ |eventTimeString |eventTimeTS | +----------------+-------------------+ |2017-08-01T02:33|2017-08-01 02:33:00| +----------------+-------------------+
val df2 = Seq(("2017-08-01T02:33Z")).toDF("eventTimeString") // ISO8601 [FAIL] val new_df2 = df2 .withColumn("eventTimeTS", col("eventTimeString").cast(TimestampType)) new_df2.show(false) +-----------------+-----------+ |eventTimeString |eventTimeTS| +-----------------+-----------+ |2017-08-01T02:33Z|null | +-----------------+-----------+
val df3 = Seq(("2017-08-01T02:33-03:00")).toDF("eventTimeString") // ISO8601 [FAIL] val new_df3 = df3 .withColumn("eventTimeTS", col("eventTimeString").cast(TimestampType)) new_df3.show(false) +----------------------+-----------+ |eventTimeString |eventTimeTS| +----------------------+-----------+ |2017-08-01T02:33-03:00|null | +----------------------+-----------+