XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 1.6.0, 2.0.0
    • 1.6.3, 2.0.0
    • SQL
    • None
    • java version "1.8.0_91"

    Description

      There is an issue with DateTimeUtils.daysToMillis implementation. It affects DateTimeUtils.toJavaDate and ultimately CatalystTypeConverter, i.e the conversion of date stored as Int days from epoch in InternalRow to java.sql.Date of Row returned to user.

      The issue can be reproduced with this test (all the following tests are in my defalut timezone Europe/Moscow):

      $ sbt -Duser.timezone=Europe/Moscow catalyst/console
      
      scala> java.util.Calendar.getInstance().getTimeZone
      res0: java.util.TimeZone = sun.util.calendar.ZoneInfo[id="Europe/Moscow",offset=10800000,dstSavings=0,useDaylight=false,transitions=79,lastRule=null]
      
      scala> import org.apache.spark.sql.catalyst.util.DateTimeUtils._
      import org.apache.spark.sql.catalyst.util.DateTimeUtils._
      
      scala> for (days <- 0 to 20000 if millisToDays(daysToMillis(days)) != days) yield days
      res23: scala.collection.immutable.IndexedSeq[Int] = Vector(4108, 4473, 4838, 5204, 5568, 5932, 6296, 6660, 7024, 7388, 8053, 8487, 8851, 9215, 9586, 9950, 10314, 10678, 11042, 11406, 11777, 12141, 12505, 12869, 13233, 13597, 13968, 14332, 14696, 15060)
      

      For example, for 4108 day of epoch, the correct date should be 1981-04-01

      scala> DateTimeUtils.toJavaDate(4107)
      res25: java.sql.Date = 1981-03-31
      
      scala> DateTimeUtils.toJavaDate(4108)
      res26: java.sql.Date = 1981-03-31
      
      scala> DateTimeUtils.toJavaDate(4109)
      res27: java.sql.Date = 1981-04-02
      

      There was previous unsuccessful attempt to work around the problem in SPARK-11415. It seems that issue involves flaws in java date implementation and I don't see how it can be fixed without third-party libraries.

      I was not able to identify the library of choice for Spark. The following implementation uses JSR-310

      def millisToDays(millisUtc: Long): SQLDate = {
        val instant = Instant.ofEpochMilli(millisUtc)
        val zonedDateTime = instant.atZone(ZoneId.systemDefault)
        zonedDateTime.toLocalDate.toEpochDay.toInt
      }
      
      def daysToMillis(days: SQLDate): Long = {
        val localDate = LocalDate.ofEpochDay(days)
        val zonedDateTime = localDate.atStartOfDay(ZoneId.systemDefault)
        zonedDateTime.toInstant.toEpochMilli
      }
      

      that produces correct results:

      scala> for (days <- 0 to 20000 if millisToDays(daysToMillis(days)) != days) yield days
      res37: scala.collection.immutable.IndexedSeq[Int] = Vector()
      
      scala> new java.sql.Date(daysToMillis(4108))
      res36: java.sql.Date = 1981-04-01
      

      Attachments

        Issue Links

          Activity

            People

              apachespark Apache Spark
              dbushev Dmitry Bushev
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: