XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 1.6.0, 2.0.0
    • Fix Version/s: 1.6.3, 2.0.0
    • Component/s: SQL
    • Labels:
      None
    • Environment:

      java version "1.8.0_91"

      Description

      There is an issue with DateTimeUtils.daysToMillis implementation. It affects DateTimeUtils.toJavaDate and ultimately CatalystTypeConverter, i.e the conversion of date stored as Int days from epoch in InternalRow to java.sql.Date of Row returned to user.

      The issue can be reproduced with this test (all the following tests are in my defalut timezone Europe/Moscow):

      $ sbt -Duser.timezone=Europe/Moscow catalyst/console
      
      scala> java.util.Calendar.getInstance().getTimeZone
      res0: java.util.TimeZone = sun.util.calendar.ZoneInfo[id="Europe/Moscow",offset=10800000,dstSavings=0,useDaylight=false,transitions=79,lastRule=null]
      
      scala> import org.apache.spark.sql.catalyst.util.DateTimeUtils._
      import org.apache.spark.sql.catalyst.util.DateTimeUtils._
      
      scala> for (days <- 0 to 20000 if millisToDays(daysToMillis(days)) != days) yield days
      res23: scala.collection.immutable.IndexedSeq[Int] = Vector(4108, 4473, 4838, 5204, 5568, 5932, 6296, 6660, 7024, 7388, 8053, 8487, 8851, 9215, 9586, 9950, 10314, 10678, 11042, 11406, 11777, 12141, 12505, 12869, 13233, 13597, 13968, 14332, 14696, 15060)
      

      For example, for 4108 day of epoch, the correct date should be 1981-04-01

      scala> DateTimeUtils.toJavaDate(4107)
      res25: java.sql.Date = 1981-03-31
      
      scala> DateTimeUtils.toJavaDate(4108)
      res26: java.sql.Date = 1981-03-31
      
      scala> DateTimeUtils.toJavaDate(4109)
      res27: java.sql.Date = 1981-04-02
      

      There was previous unsuccessful attempt to work around the problem in SPARK-11415. It seems that issue involves flaws in java date implementation and I don't see how it can be fixed without third-party libraries.

      I was not able to identify the library of choice for Spark. The following implementation uses JSR-310

      def millisToDays(millisUtc: Long): SQLDate = {
        val instant = Instant.ofEpochMilli(millisUtc)
        val zonedDateTime = instant.atZone(ZoneId.systemDefault)
        zonedDateTime.toLocalDate.toEpochDay.toInt
      }
      
      def daysToMillis(days: SQLDate): Long = {
        val localDate = LocalDate.ofEpochDay(days)
        val zonedDateTime = localDate.atStartOfDay(ZoneId.systemDefault)
        zonedDateTime.toInstant.toEpochMilli
      }
      

      that produces correct results:

      scala> for (days <- 0 to 20000 if millisToDays(daysToMillis(days)) != days) yield days
      res37: scala.collection.immutable.IndexedSeq[Int] = Vector()
      
      scala> new java.sql.Date(daysToMillis(4108))
      res36: java.sql.Date = 1981-04-01
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                apachespark Apache Spark
                Reporter:
                dbushev Dmitry Bushev
              • Votes:
                0 Vote for this issue
                Watchers:
                9 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: