Description
DateTimeBenchmark shows the regression
Spark 2.4.6-SNAPSHOT at the PR https://github.com/MaxGekk/spark/pull/27
OpenJDK 64-Bit Server VM 1.8.0_242-8u242-b08-0ubuntu3~18.04-b08 on Linux 4.15.0-1063-aws Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz To/from Java's date-time: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ From java.sql.Date 559 603 38 8.9 111.8 1.0X Collect dates 2306 3221 1558 2.2 461.1 0.2X
Current master:
OpenJDK 64-Bit Server VM 1.8.0_242-8u242-b08-0ubuntu3~18.04-b08 on Linux 4.15.0-1063-aws Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz To/from Java's date-time: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ From java.sql.Date 1052 1130 73 4.8 210.3 1.0X Collect dates 3251 4943 1624 1.5 650.2 0.3X
If we subtract preparing DATE column:
- Spark 2.4.6-SNAPSHOT is (461.1 - 111.8) = 349.3 ns/row
- master is (650.2 - 210.3) = 439 ns/row
The regression of toJavaDate in master against Spark 2.4.6-SNAPSHOT is (439 - 349.3)/349.3 = 25%
Attachments
Issue Links
- causes
-
SPARK-31449 Investigate the difference between JDK and Spark's time zone offset calculation
- Resolved
- is a clone of
-
SPARK-31439 Perf regression of fromJavaDate
- Resolved
- links to