Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-31449

Investigate the difference between JDK and Spark's time zone offset calculation

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.4.5
    • 2.4.6
    • SQL
    • None

    Description

      Spark 2.4 calculates time zone offsets from wall clock timestamp using `DateTimeUtils.getOffsetFromLocalMillis()` (see https://github.com/apache/spark/blob/branch-2.4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L1088-L1118):

        private[sql] def getOffsetFromLocalMillis(millisLocal: Long, tz: TimeZone): Long = {
          var guess = tz.getRawOffset
          // the actual offset should be calculated based on milliseconds in UTC
          val offset = tz.getOffset(millisLocal - guess)
          if (offset != guess) {
            guess = tz.getOffset(millisLocal - offset)
            if (guess != offset) {
              // fallback to do the reverse lookup using java.sql.Timestamp
              // this should only happen near the start or end of DST
              val days = Math.floor(millisLocal.toDouble / MILLIS_PER_DAY).toInt
              val year = getYear(days)
              val month = getMonth(days)
              val day = getDayOfMonth(days)
      
              var millisOfDay = (millisLocal % MILLIS_PER_DAY).toInt
              if (millisOfDay < 0) {
                millisOfDay += MILLIS_PER_DAY.toInt
              }
              val seconds = (millisOfDay / 1000L).toInt
              val hh = seconds / 3600
              val mm = seconds / 60 % 60
              val ss = seconds % 60
              val ms = millisOfDay % 1000
              val calendar = Calendar.getInstance(tz)
              calendar.set(year, month - 1, day, hh, mm, ss)
              calendar.set(Calendar.MILLISECOND, ms)
              guess = (millisLocal - calendar.getTimeInMillis()).toInt
            }
          }
          guess
        }
      

      Meanwhile, JDK's GregorianCalendar uses special methods of ZoneInfo, see https://github.com/AdoptOpenJDK/openjdk-jdk8u/blob/aa318070b27849f1fe00d14684b2a40f7b29bf79/jdk/src/share/classes/java/util/GregorianCalendar.java#L2795-L2801:

                  if (zone instanceof ZoneInfo) {
                      ((ZoneInfo)zone).getOffsetsByWall(millis, zoneOffsets);
                  } else {
                      int gmtOffset = isFieldSet(fieldMask, ZONE_OFFSET) ?
                                          internalGet(ZONE_OFFSET) : zone.getRawOffset();
                      zone.getOffsets(millis - gmtOffset, zoneOffsets);
                  }
      

      Need to investigate are there any differences in results between 2 approaches.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            maxgekk Max Gekk
            maxgekk Max Gekk
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment