Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-33995

Make datetime addition easier for years, weeks, hours, minutes, and seconds

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 3.2.0
    • 3.2.0
    • SQL
    • None

    Description

      There are add_months and date_add functions that make it easy to perform datetime addition with months and days, but there isn't an easy way to perform datetime addition with years, weeks, hours, minutes, or seconds with the Scala/Python/R APIs.

      Users need to write code like expr("first_datetime + INTERVAL 2 hours") to add two hours to a timestamp with the Scala API, which isn't desirable.  We don't want to make Scala users manipulate SQL strings.

      We can expose the make_interval SQL function to make any combination of datetime addition possible.  That'll make tons of different datetime addition operations possible and will be valuable for a wide array of users.

      make_interval takes 7 arguments: years, months, weeks, days, hours, mins, and secs.

      There are different ways to expose the make_interval functionality to Scala/Python/R users:

      • Option 1: Single make_interval function that takes 7 arguments
      • Option 2: expose a few interval functions
        • make_date_interval function that takes years, months, days
        • make_time_interval function that takes hours, minutes, seconds
        • make_datetime_interval function that takes years, months, days, hours, minutes, seconds
      • Option 3: expose add_years, add_months, add_days, add_weeks, add_hours, add_minutes, and add_seconds as Column methods.  
      • Option 4: Expose the add_years, add_hours, etc. as column functions.  add_weeks and date_add have already been exposed in this manner.  

      Option 1 is nice from a maintenance perspective cause it's a single function, but it's not standard from a user perspective.  Most languages support datetime instantiation with these arguments: years, months, days, hours, minutes, seconds.  Mixing weeks into the equation is not standard.

      As a user, Option 3 would be my preference.  col("first_datetime").addHours(2).addSeconds(30) is easy for me to remember and type.  col("first_datetime") + make_time_interval(lit(2), lit(0), lit(30)) isn't as nice.  col("first_datetime") + make_interval(lit(0), lit(0), lit(0), lit(0), lit(2), lit(0), lit(30)) is harder still.

      Any of these options is an improvement to the status quo.  Let me know what option you think is best and then I'll make a PR to implement it, building off of Max's foundational work of course

      Attachments

        Activity

          People

            mrpowers Matthew Powers
            mrpowers Matthew Powers
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: