Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-38520

Overflow occurs when reading ANSI day time interval from CSV file

    XMLWordPrintableJSON

Details

    • Bug
    • Status: In Progress
    • Major
    • Resolution: Unresolved
    • 3.3.0
    • None
    • Spark Core
    • None

    Description

      Problem:

      Overflow occurs when reading the following positive intervals, the results become to negative

      interval '106751992' day     => INTERVAL '-106751990' DAY

      INTERVAL +'+2562047789' hour => INTERVAL '-2562047787' HOUR

      interval '153722867281' minute => INTERVAL '-153722867280' MINUTE

       

      Reproduce:

      // days overflow
       scala> val schema = StructType(Seq(StructField("c1",
         DayTimeIntervalType(DayTimeIntervalType.DAY, DayTimeIntervalType.DAY))))
       scala> spark.read.csv(path).show(false)
       +------------------------+
       |_c0                     |
       +------------------------+
       |interval '106751992' day|
       +------------------------+
       scala> spark.read.schema(schema).csv(path).show(false)
       +-------------------------+
       |c1                       |
       +-------------------------+
       |INTERVAL '-106751990' DAY|
       +-------------------------+
        // hour overflow
       scala> val schema = StructType(Seq(StructField("c1",
         DayTimeIntervalType(DayTimeIntervalType.HOUR, DayTimeIntervalType.HOUR))))
       scala> spark.read.csv(path).show(false)
       +----------------------------+
       |_c0                         |
       +----------------------------+
       |INTERVAL +'+2562047789' hour|
       +----------------------------+
       scala> spark.read.schema(schema).csv(path).show(false)
       +---------------------------+
       |c1                         |
       +---------------------------+
       |INTERVAL '-2562047787' HOUR|
       +---------------------------+
       // minute overflow
       scala> val schema = StructType(Seq(StructField("c1",
         DayTimeIntervalType(DayTimeIntervalType.MINUTE, DayTimeIntervalType.MINUTE))))
       scala> spark.read.csv(path).show(false)
       +------------------------------+
       |_c0                           |
       +------------------------------+
       |interval '153722867281' minute|
       +------------------------------+
       scala> spark.read.schema(schema).csv(path).show(false)
       +-------------------------------+
       |c1                             |
       +-------------------------------+
       |INTERVAL '-153722867280' MINUTE|
       +-------------------------------+
      

       

      others:

      Also check the negative value is read to positive.

       

      others:

      should check the negative also

      Attachments

        Activity

          People

            Unassigned Unassigned
            chongg@nvidia chong
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: