Description
Spark 2.4 calculates time zone offsets from wall clock timestamp using `DateTimeUtils.getOffsetFromLocalMillis()` (see https://github.com/apache/spark/blob/branch-2.4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L1088-L1118):
private[sql] def getOffsetFromLocalMillis(millisLocal: Long, tz: TimeZone): Long = { var guess = tz.getRawOffset // the actual offset should be calculated based on milliseconds in UTC val offset = tz.getOffset(millisLocal - guess) if (offset != guess) { guess = tz.getOffset(millisLocal - offset) if (guess != offset) { // fallback to do the reverse lookup using java.sql.Timestamp // this should only happen near the start or end of DST val days = Math.floor(millisLocal.toDouble / MILLIS_PER_DAY).toInt val year = getYear(days) val month = getMonth(days) val day = getDayOfMonth(days) var millisOfDay = (millisLocal % MILLIS_PER_DAY).toInt if (millisOfDay < 0) { millisOfDay += MILLIS_PER_DAY.toInt } val seconds = (millisOfDay / 1000L).toInt val hh = seconds / 3600 val mm = seconds / 60 % 60 val ss = seconds % 60 val ms = millisOfDay % 1000 val calendar = Calendar.getInstance(tz) calendar.set(year, month - 1, day, hh, mm, ss) calendar.set(Calendar.MILLISECOND, ms) guess = (millisLocal - calendar.getTimeInMillis()).toInt } } guess }
Meanwhile, JDK's GregorianCalendar uses special methods of ZoneInfo, see https://github.com/AdoptOpenJDK/openjdk-jdk8u/blob/aa318070b27849f1fe00d14684b2a40f7b29bf79/jdk/src/share/classes/java/util/GregorianCalendar.java#L2795-L2801:
if (zone instanceof ZoneInfo) { ((ZoneInfo)zone).getOffsetsByWall(millis, zoneOffsets); } else { int gmtOffset = isFieldSet(fieldMask, ZONE_OFFSET) ? internalGet(ZONE_OFFSET) : zone.getRawOffset(); zone.getOffsets(millis - gmtOffset, zoneOffsets); }
Need to investigate are there any differences in results between 2 approaches.
Attachments
Issue Links
- is caused by
-
SPARK-31443 Perf regression of toJavaDate
- Resolved
- links to