[SPARK-32630] Reduce user confusion and subtle bugs by optionally preventing date & timestamp comparison - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 3.0.0
Fix Version/s: None
Component/s: SQL
Labels:

Description

https://issues.apache.org/jira/browse/SPARK-23549 made Spark's handling of date vs. timestamp comparison consistent with SQL, which, unfortunately, isn't consistent with common sense.

When dates are compared with timestamps, they are promoted to timestamps at midnight of the date, in the server timezone, which is almost always UTC. This only works well if all timestamps in the data are logically time instants as opposed to dates + times, which only become instants with a known timezone.

The fundamental issue is that dates are a human time concept and instant are a machine time concept. While we can technically promote one to the other, logically, it only works 100% if midnight for all dates in the system is in the server timezone.

Every major modern platform offers a clear distinction between machine time (instants) and human time (an instant with a timezone, UTC offset, etc.), because we have learned the hard way that date & time handling is a never-ending source of confusion and bugs. SQL, being an ancient language (40+ years old), is well behind software engineering best practices; using it as a guiding light is necessary for Spark to win market share, but unfortunate in every other way.

For example, Java has:

java.time.LocalDate
java.time.Instant
java.time.ZonedDateTime
java.time.OffsetDateTime

I am not suggesting we add new data types to Spark. I am suggesting we go to the heart of the matter, which is that most date vs. time handling issues are the result of confusion or carelessness.

What about introducing a new setting that makes comparisons between dates and timestamps illegal, preferably with a helpful exception message?

If it existed, I would certainly make it the default for all our clusters. The minor coding convenience that comes from being able to compare dates & timestamps with an automatic type promotion pales in comparison with the risk of subtle bugs that remain undetected for a long time.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Simeon Simeonov

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 16/Aug/20 21:12

Updated:: 16/Aug/20 21:14