Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-27790

Support ANSI SQL INTERVAL types

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: In Progress
    • Major
    • Resolution: Unresolved
    • 3.2.0
    • None
    • SQL
    • None

    Description

      Spark has an INTERVAL data type, but it is “broken”:

      1. It cannot be persisted
      2. It is not comparable because it crosses the month day line. That is there is no telling whether “1 Month 1 Day” is equal to “1 Month 1 Day” since not all months have the same number of days.

      I propose here to introduce the two flavours of INTERVAL as described in the ANSI SQL Standard and deprecate the Sparks interval type.

      • ANSI describes two non overlapping “classes”:
        • YEAR-MONTH,
        • DAY-SECOND ranges
      • Members within each class can be compared and sorted.
      • Supports datetime arithmetic
      • Can be persisted.

      The old and new flavors of INTERVAL can coexist until Spark INTERVAL is eventually retired. Also any semantic “breakage” can be controlled via legacy config settings.

      Milestone 1 – Spark Interval equivalency ( The new interval types meet or exceed all function of the existing SQL Interval):

      • Add two new DataType implementations for interval year-month and day-second. Includes the JSON format and DLL string.
      • Infra support: check the caller sides of DateType/TimestampType
      • Support the two new interval types in Dataset/UDF.
      • Interval literals (with a legacy config to still allow mixed year-month day-seconds fields and return legacy interval values)
      • Interval arithmetic(interval * num, interval / num, interval +/- interval)
      • Datetime functions/operators: Datetime - Datetime (to days or day second), Datetime +/- interval
      • Cast to and from the new two interval types, cast string to interval, cast interval to string (pretty printing), with the SQL syntax to specify the types
      • Support sorting intervals.

      Milestone 2 – Persistence:

      • Ability to create tables of type interval
      • Ability to write to common file formats such as Parquet and JSON.
      • INSERT, SELECT, UPDATE, MERGE
      • Discovery

      Milestone 3 – Client support

      • JDBC support
      • Hive Thrift server

      Milestone 4 – PySpark and Spark R integration

      • Python UDF can take and return intervals
      • DataFrame support

      Attachments

        Issue Links

          1.
          Add ANSI SQL day-time and year-month interval types Sub-task Resolved Max Gekk
          2.
          Support java.time.Duration as an external type of the day-time interval type Sub-task Resolved Max Gekk
          3.
          Support java.time.Period as an external type of the year-month interval type Sub-task Resolved Max Gekk
          4.
          Update the Spark SQL guide about day-time and year-month interval types Sub-task Resolved Max Gekk
          5.
          Test year-month and day-time intervals in UDF Sub-task Resolved Max Gekk
          6.
          Test DayTimeIntervalType/YearMonthIntervalType as ordered and atomic types Sub-task Resolved Max Gekk
          7.
          Support casting of year-month intervals to strings Sub-task Resolved Max Gekk
          8.
          Support casting of day-time intervals to strings Sub-task Resolved Max Gekk
          9.
          Support add and subtract of ANSI SQL intervals Sub-task Resolved Max Gekk
          10.
          Overflow in round trip conversion from micros to duration Sub-task Resolved Max Gekk
          11.
          Add round trip tests for period <-> month and duration <-> micros Sub-task Resolved jiaan.geng
          12.
          Support ANSI SQL intervals by the aggregate function `sum` Sub-task Resolved jiaan.geng
          13.
          Assign pretty names to YearMonthIntervalType and DayTimeIntervalType Sub-task Resolved Max Gekk
          14.
          Add an year-month interval to a date Sub-task Resolved Max Gekk
          15.
          Add an year-month interval to a timestamp Sub-task Resolved Max Gekk
          16.
          Add a day-time interval to a timestamp Sub-task Resolved Max Gekk
          17.
          Prohibit saving of day-time and year-month intervals Sub-task Resolved Max Gekk
          18.
          Multiply year-month interval by numeric Sub-task Resolved Max Gekk
          19.
          Support ANSI SQL intervals by the aggregate function `avg` Sub-task Resolved jiaan.geng
          20.
          Push ANSI interval binary expressions into into (if / case) branches Sub-task Resolved angerszhu
          21.
          Multiply day-time interval by numeric Sub-task Resolved Max Gekk
          22.
          Divide year-month interval by numeric Sub-task Resolved Max Gekk
          23.
          Divide day-time interval by numeric Sub-task Resolved Max Gekk
          24.
          Test actual size of year-month and day-time intervals Sub-task Resolved PengLei
          25.
          Hive inspect support DayTimeIntervalType and YearMonthIntervalType Sub-task Resolved angerszhu
          26.
          Return day-time interval from dates subtraction Sub-task Resolved Max Gekk
          27.
          Support ANSI intervals by date_part() Sub-task Resolved Kent Yao
          28.
          Support cast long to DayTimeIntervalType and cast Int to YearMonthIntervalType Sub-task Resolved Unassigned
          29.
          Return day-time interval from timestamps subtraction Sub-task Resolved Max Gekk
          30.
          Enable ANSI intervals in SQLQueryTestSuite Sub-task Resolved Max Gekk
          31.
          ANSI intervals formatting in hive results Sub-task Resolved Max Gekk
          32.
          Format ANSI intervals in Hive style Sub-task Resolved Max Gekk
          33.
          Transfer ANSI intervals via Hive Thrift server Sub-task Resolved Max Gekk
          34.
          Construct year-month interval column from integral fields Sub-task Resolved angerszhu
          35.
          Test transferring year-month interval via Hive Thrift server Sub-task Resolved Max Gekk
          36.
          Cast string to year-month interval Sub-task Resolved angerszhu
          37.
          Recognize sign before the interval string in literals Sub-task Resolved Max Gekk
          38.
          Add a day-time interval to a date Sub-task Resolved Max Gekk
          39.
          Parse interval literals as ANSI intervals Sub-task Resolved Max Gekk
          40.
          Get columns operation should handle ANSI interval column properly Sub-task Resolved jiaan.geng
          41.
          Accept ANSI intervals by the Sequence expression Sub-task Resolved jiaan.geng
          42.
          Extract a field from ANSI interval Sub-task Resolved Kent Yao
          43.
          IntervalUtils.fromYearMonthString can't handle Int.MinValue correctly Sub-task Resolved angerszhu
          44.
          Use ANSI intervals in streaming join tests Sub-task Resolved Kousuke Saruta
          45.
          Convert ANSI interval literals to SQL string Sub-task Resolved Max Gekk
          46.
          Parse unit-to-unit interval literals to ANSI intervals Sub-task Resolved Max Gekk
          47.
          Handle ANSI intervals in WindowExecBase Sub-task Resolved jiaan.geng
          48.
          Cast string to day-time interval Sub-task Resolved angerszhu
          49.
          Support ANSI intervals in the Hash expression Sub-task Resolved angerszhu
          50.
          Test ANSI interval literals Sub-task Resolved Max Gekk
          51.
          Test ANSI intervals in MutableProjectionSuite Sub-task Resolved Max Gekk
          52.
          The generated data fits the precision of DayTimeIntervalType in spark Sub-task Resolved jiaan.geng
          53.
          Add tests for ANSI intervals to HiveThriftBinaryServerSuite Sub-task Resolved angerszhu
          54.
          Construct day-time interval column from integral fields Sub-task Resolved Max Gekk
          55.
          Support ANSI intervals as Arrow Column vectors Sub-task Resolved PengLei
          56.
          Override `sql()` of ANSI interval operators Sub-task Resolved Max Gekk
          57.
          Wrong result of min ANSI interval division by -1 Sub-task Resolved angerszhu
          58.
          Failure on minimal interval literal Sub-task Resolved angerszhu
          59.
          Support columnar execution on ANSI interval types Sub-task Resolved Peng Lei
          60.
          Parse ANSI interval types in SQL Sub-task Resolved Max Gekk
          61.
          Support fields by year-month interval type Sub-task Resolved Max Gekk
          62.
          Support fields by the day-time interval type Sub-task Resolved Max Gekk
          63.
          Truncate java.time.Duration by fields of day-time interval type Sub-task Resolved angerszhu
          64.
          Return INTERVAL DAY from dates subtraction Sub-task Resolved PengLei
          65.
          Check multiply/divide of day-time intervals of any fields by numeric Sub-task Resolved PengLei
          66.
          Check all day-time interval types in aggregate expressions Sub-task Resolved Kousuke Saruta
          67.
          Check all day-time interval types in UDF Sub-task Resolved angerszhu
          68.
          Check all day-time interval types in arrow Sub-task Resolved angerszhu
          69.
          Parse DayTimeIntervalType from JSON Sub-task Resolved angerszhu
          70.
          Check all day-time interval types in HiveInspectors tests Sub-task Resolved angerszhu
          71.
          Format day-time intervals using type fields Sub-task Resolved Kousuke Saruta
          72.
          Take into account day-time interval fields in cast Sub-task Resolved angerszhu
          73.
          Parse any day-time interval types in SQL Sub-task Resolved Kousuke Saruta
          74.
          Parse day-time interval literals to tightest types Sub-task Resolved Kousuke Saruta
          75.
          Parse unit list interval literals as year-month/day-time interval types Sub-task Resolved Kousuke Saruta
          76.
          Take into account year-month interval fields in cast Sub-task Resolved angerszhu
          77.
          Truncate java.time.Period by fields of year-month interval type Sub-task Resolved angerszhu
          78.
          Parse YearMonthIntervalType from JSON Sub-task Resolved angerszhu
          79.
          Format year-month intervals using type fields Sub-task Resolved Kousuke Saruta
          80.
          Check all year-month interval types in HiveInspectors tests Sub-task Resolved angerszhu
          81.
          Parse year-month interval literals to tightest types Sub-task Resolved Kousuke Saruta
          82.
          Parse any year-month interval types in SQL Sub-task Resolved Kousuke Saruta
          83.
          Check all year-month interval types in aggregate expressions Sub-task Resolved Kousuke Saruta
          84.
          Check all year-month interval types in arrow Sub-task Resolved Apache Spark
          85.
          Check all year-month interval types in UDF Sub-task Resolved angerszhu
          86.
          Check multiply/divide of year-month intervals of any fields by numeric Sub-task Resolved PengLei
          87.
          Allow delayThreshold for watermark to be represented as ANSI day-time/year-month interval literals Sub-task Resolved Kousuke Saruta
          88.
          Support cast between different field YearMonthIntervalType Sub-task Resolved angerszhu
          89.
          Support cast between different DayTimeIntervalType Sub-task Resolved angerszhu
          90.
          Show proper error message when update column types to year-month/day-time interval Sub-task Resolved Apache Spark
          91.
          Add `apply()` for a single field to `YearMonthIntervalType` and `DayTimeIntervalType` Sub-task Resolved Max Gekk
          92.
          Improve the implementation for DateType +/- DayTimeIntervalType(DAY) Sub-task Resolved PengLei
          93.
          Move new interval type test cases from CastSuite to CastBaseSuite Sub-task Resolved Gengliang Wang
          94.
          Support upcast between different field of YearMonthIntervalType/DayTimeIntervalType Sub-task Resolved angerszhu
          95.
          Literal.create(value, dataType) should support fields Sub-task Resolved angerszhu
          96.
          Unify IntervalUtils.castStringToYMInterval with parser Sub-task Open Unassigned
          97.
          Unify IntervalUtils.castStringToDTInterval with parser Sub-task Open Unassigned
          98.
          Support DayTimeIntervalType in width-bucket function Sub-task Resolved PengLei
          99.
          Support YearMonthIntervalType in width-bucket function Sub-task Resolved PengLei
          100.
          Allow from_json/to_json for map types where value types are year-month intervals Sub-task Resolved Kousuke Saruta
          101.
          Allow from_json/to_json for map types where value types are day-time intervals Sub-task Resolved Kousuke Saruta
          102.
          Make from_csv/to_csv to handle year-month intervals properly Sub-task Resolved Kousuke Saruta
          103.
          Make from_csv/to_csv to handle day-time intervals properly Sub-task Resolved Kousuke Saruta
          104.
          Incorrect parsing of the start field in interval literals Sub-task Resolved angerszhu
          105.
          Respect interval fields in extract Sub-task Resolved Kousuke Saruta
          106.
          Confusing error from casting a string to ANSI interval Sub-task Resolved angerszhu
          107.
          Interval str should handle start field == end Field Sub-task Resolved Unassigned
          108.
          Remove IntervalUnit in code Sub-task Resolved angerszhu
          109.
          Change quoted interval literal (interval constructor) to be converted to ANSI interval types Sub-task Resolved Kousuke Saruta
          110.
          SparkScriptTransformation should support ANSI interval types Sub-task Resolved Kousuke Saruta
          111.
          Step by days in the Sequence expression for dates Sub-task Resolved jiaan.geng
          112.
          Update docs about mapping of ANSI interval types to Java/Scala/SQL types Sub-task Resolved Max Gekk
          113.
          Support ANSI interval literals for TimeWindow Sub-task Resolved Kousuke Saruta
          114.
          Disallow ANSI intervals in file-based datasources Sub-task Resolved Max Gekk
          115.
          Support comparison of ANSI intervals with different fields Sub-task Resolved angerszhu
          116.
          Update docs about ANSI interval literals Sub-task Resolved Max Gekk
          117.
          Support cast type constructed string to year month interval Sub-task In Progress Unassigned
          118.
          Support type constructed string as dat time interval Sub-task In Progress Unassigned
          119.
          Support Interval add/subtract NULL Sub-task Resolved Gengliang Wang
          120.
          Test Interval multiply / divide null Sub-task Resolved Gengliang Wang
          121.
          Disallow comparison between Interval and String Sub-task Resolved Gengliang Wang
          122.
          Add common class/trait for ANSI interval types Sub-task Resolved Max Gekk
          123.
          DivideYMInterval and DivideDTInterval should throw the same exception when divide by zero. Sub-task Resolved jiaan.geng
          124.
          DivideDTInterval should throw the same exception when divide by zero. Sub-task Resolved Unassigned
          125.
          day-time interval types should respect daylight saving time correctly Sub-task Open Unassigned
          126.
          Merge ANSI interval types to a tightest common type Sub-task Resolved Max Gekk
          127.
          Read/write dataframes with ANSI intervals from/to parquet files Sub-task Resolved Max Gekk
          128.
          Read/write dataframes with ANSI intervals from/to JSON files Sub-task Resolved Kousuke Saruta
          129.
          Read/write dataframes with ANSI intervals from/to CSV files Sub-task Resolved Kousuke Saruta
          130.
          Incorrect parsing of negative ANSI typed interval literals Sub-task Resolved Peng Lei
          131.
          Test ANSI interval support by the Parquet datasource Sub-task Resolved Max Gekk
          132.
          Parquet reader fails on load of ANSI interval when off-heap is enabled Sub-task Resolved Max Gekk
          133.
          Pushdown filters with ANSI interval values to parquet Sub-task Resolved Max Gekk
          134.
          Support ANSI intervals by ABS Sub-task Resolved Max Gekk
          135.
          The DIV function should support ANSI intervals Sub-task Resolved PengLei
          136.
          The SIGN/SIGNUM functions should support ANSI intervals Sub-task Resolved PengLei
          137.
          Allow coercing of an interval expression to a specific interval type Sub-task Open Unassigned
          138.
          CAST between ANSI intervals and numerics Sub-task Resolved PengLei
          139.
          Handle ANSI intervals in ColumnarRow, ColumnarBatchRow and ColumnarArray Sub-task Resolved PengLei
          140.
          Read/write dataframes with ANSI intervals from/to ORC files Sub-task Resolved Kousuke Saruta
          141.
          Check saving of a dataframe with ANSI intervals to a Hive parquet table Sub-task Resolved Apache Spark
          142.
          Check CREATE TABLE with ANSI intervals using Hive external catalog and Parquet Sub-task Resolved Max Gekk
          143.
          Fix CREATE TABLE AS SELECT of ANSI intervals Sub-task Resolved Max Gekk
          144.
          Pushdown filters with ANSI interval values to ORC Sub-task Resolved Kousuke Saruta
          145.
          Support AnsiInterval radix sort Sub-task Resolved XiDuo You
          146.
          Support ANSI Interval in functions that support numeric type Sub-task Resolved angerszhu
          147.
          RowToColumnConverter support AnsiIntervalType Sub-task Resolved PengLei
          148.
          Read/write dataframes with ANSI intervals from/to Avro files Sub-task Resolved Max Gekk
          149.
          Dynamic writes/reads of ANSI interval partitions Sub-task Resolved Max Gekk
          150.
          Cannot read partitioned parquet files with ANSI interval partition values Sub-task Resolved Max Gekk
          151.
          Check adding partitions with ANSI intervals Sub-task Resolved Max Gekk
          152.
          Check inserting of ANSI intervals into a table partitioned by the interval columns Sub-task Resolved Max Gekk
          153.
          Check replacing columns with ANSI intervals Sub-task Resolved Max Gekk
          154.
          Check adding of ANSI interval columns to v1/v2 tables Sub-task Resolved Max Gekk

          Activity

            People

              apachespark Apache Spark
              maxgekk Max Gekk
              Votes:
              1 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated: