Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
3.4.0
-
None
Description
I write a mirco-benchmark to test Jackson ObjectWriter read and write:
and run this use GA:
OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1022-azure Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz Test create ObjectMapper: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ Test create ObjectMapper 648 652 4 0.0 64819.0 1.0XOpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1022-azure Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz Test write map to json: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ Test Multiple 2116 2127 15 0.0 211556.5 1.0X Test Single 4 4 0 2.4 416.1 508.4XOpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1022-azure Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz Test read json to map: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ Test Multiple 8848 8867 27 0.0 884776.2 1.0X Test Single
From the test results, we should use singleton Jackson ObjectMapper, because it seems expensive to new a ObjectMapper instance.
The following code in Spark not use singleton:
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java core/src/main/scala/org/apache/spark/status/api/v1/JacksonMessageWriter.scala core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/SparkKubernetesClientFactory.scala sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileDataSourceV2.scala sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Utils.scala sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/RebaseDateTime.scala
can find hotpath and fix them