Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-14428

[SQL] Allow more flexibility when parsing dates and timestamps in json datasources

    XMLWordPrintableJSON

    Details

      Description

      Reading a json with dates and timestamps is limited to predetermined string formats or long values.

      1) Should be able to set an option on json datasource to parse dates and timestamps using custom string format.
      2) Should be able to change the interpretation of long values since epoch. It could support different precisions like days, seconds, milliseconds, microseconds and nanoseconds.

      Something in the lines of :

      object Precision extends Enumeration {
          val days, seconds, milliseconds, microseconds, nanoseconds = Value
        }
      
      def convertWithPrecision(time: Long, from: Precision.Value, to: Precision.Value): Long = ...
      ...
      
        val dateFormat = parameters.getOrElse("dateFormat", "").trim
        val timestampFormat = parameters.getOrElse("timestampFormat", "").trim
        val longDatePrecision = getOrElse("longDatePrecision", "days")
        val longTimestampPrecision = getOrElse("longTimestampPrecision", "milliseconds")
      

      and

            case (VALUE_STRING, DateType) =>
              val stringValue = parser.getText
              val days = if (configOptions.dateFormat.nonEmpty) {
                // User defined format, make sure it complies to the SQL DATE format (number of days)
                val sdf = new SimpleDateFormat(configOptions.dateFormat) // Not thread safe.
                DateTimeUtils.convertWithPrecision(sdf.parse(stringValue).getTime, Precision.milliseconds, Precision.days)
              } else if (stringValue.forall(_.isDigit)) {
                DateTimeUtils.convertWithPrecision(stringValue.toLong, configOptions.longDatePrecision, Precision.days)
              } else {
                // The format of this string will probably be "yyyy-mm-dd".
                DateTimeUtils.convertWithPrecision(DateTimeUtils.stringToTime(parser.getText).getTime, Precision.milliseconds, Precision.days)
              }
              days.toInt
      
            case (VALUE_NUMBER_INT, DateType) =>
                DateTimeUtils.convertWithPrecision((parser.getLongValue, configOptions.longDatePrecision, Precision.days).toInt
      

      With similar handling for Timestamps.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                FlamingMike Michel Lemay
              • Votes:
                1 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: