Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-14428

[SQL] Allow more flexibility when parsing dates and timestamps in json datasources

    XMLWordPrintableJSON

Details

    Description

      Reading a json with dates and timestamps is limited to predetermined string formats or long values.

      1) Should be able to set an option on json datasource to parse dates and timestamps using custom string format.
      2) Should be able to change the interpretation of long values since epoch. It could support different precisions like days, seconds, milliseconds, microseconds and nanoseconds.

      Something in the lines of :

      object Precision extends Enumeration {
          val days, seconds, milliseconds, microseconds, nanoseconds = Value
        }
      
      def convertWithPrecision(time: Long, from: Precision.Value, to: Precision.Value): Long = ...
      ...
      
        val dateFormat = parameters.getOrElse("dateFormat", "").trim
        val timestampFormat = parameters.getOrElse("timestampFormat", "").trim
        val longDatePrecision = getOrElse("longDatePrecision", "days")
        val longTimestampPrecision = getOrElse("longTimestampPrecision", "milliseconds")
      

      and

            case (VALUE_STRING, DateType) =>
              val stringValue = parser.getText
              val days = if (configOptions.dateFormat.nonEmpty) {
                // User defined format, make sure it complies to the SQL DATE format (number of days)
                val sdf = new SimpleDateFormat(configOptions.dateFormat) // Not thread safe.
                DateTimeUtils.convertWithPrecision(sdf.parse(stringValue).getTime, Precision.milliseconds, Precision.days)
              } else if (stringValue.forall(_.isDigit)) {
                DateTimeUtils.convertWithPrecision(stringValue.toLong, configOptions.longDatePrecision, Precision.days)
              } else {
                // The format of this string will probably be "yyyy-mm-dd".
                DateTimeUtils.convertWithPrecision(DateTimeUtils.stringToTime(parser.getText).getTime, Precision.milliseconds, Precision.days)
              }
              days.toInt
      
            case (VALUE_NUMBER_INT, DateType) =>
                DateTimeUtils.convertWithPrecision((parser.getLongValue, configOptions.longDatePrecision, Precision.days).toInt
      

      With similar handling for Timestamps.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              FlamingMike Michel Lemay
              Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: