Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-31404 file source backward compatibility after calendar switch
  3. SPARK-31426

Regression in loading/saving timestamps from/to ORC files

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.0.0
    • Fix Version/s: 3.0.0
    • Component/s: SQL
    • Labels:
      None

      Description

      Here are results of DateTimeRebaseBenchmark on the current master branch:

      Save timestamps to ORC:                   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
      ------------------------------------------------------------------------------------------------------------------------
      after 1582                                        59877          59877           0          1.7         598.8       0.0X
      before 1582                                       61361          61361           0          1.6         613.6       0.0X
      
      Load timestamps from ORC:                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
      ------------------------------------------------------------------------------------------------------------------------
      after 1582, vec off                               48197          48288         118          2.1         482.0       1.0X
      after 1582, vec on                                38247          38351         128          2.6         382.5       1.3X
      before 1582, vec off                              53179          53359         249          1.9         531.8       0.9X
      before 1582, vec on                               44076          44268         269          2.3         440.8       1.1X
      

      The results of the same benchmark on Spark 2.4.6-SNAPSHOT:

      Save timestamps to ORC:                   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
      ------------------------------------------------------------------------------------------------------------------------
      after 1582                                        18858          18858           0          5.3         188.6       1.0X
      before 1582                                       18508          18508           0          5.4         185.1       1.0X
      
      Load timestamps from ORC:                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
      ------------------------------------------------------------------------------------------------------------------------
      after 1582, vec off                               14063          14177         143          7.1         140.6       1.0X
      after 1582, vec on                                 5955           6029         100         16.8          59.5       2.4X
      before 1582, vec off                              14119          14126           7          7.1         141.2       1.0X
      before 1582, vec on                                5991           6007          25         16.7          59.9       2.3X
      

       Here is the PR with DateTimeRebaseBenchmark backported to 2.4: https://github.com/MaxGekk/spark/pull/27

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                maxgekk Max Gekk
                Reporter:
                maxgekk Max Gekk
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: