Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-39146

The singleton Jackson ObjectMapper should be preferred

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 3.4.0
    • 3.4.0
    • Spark Core, SQL
    • None

    Description

      I write a mirco-benchmark to test Jackson ObjectWriter read and write:

      https://github.com/LuciferYang/spark/blob/objectMapper/sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/JacksonBenchmark.scala

      and run this use GA:

       

      OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1022-azure
      Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
      Test create ObjectMapper:                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
      ------------------------------------------------------------------------------------------------------------------------
      Test create ObjectMapper                            648            652           4          0.0       64819.0       1.0XOpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1022-azure
      Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
      Test write map to json:                   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
      ------------------------------------------------------------------------------------------------------------------------
      Test Multiple                                      2116           2127          15          0.0      211556.5       1.0X
      Test Single                                           4              4           0          2.4         416.1     508.4XOpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1022-azure
      Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
      Test read json to map:                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
      ------------------------------------------------------------------------------------------------------------------------
      Test Multiple                                      8848           8867          27          0.0      884776.2       1.0X
      Test Single   

       

       

      From the test results, we should use singleton Jackson ObjectMapper, because it seems expensive to new a ObjectMapper instance.

       

      The following code in Spark not use singleton:

       

      common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java
      core/src/main/scala/org/apache/spark/status/api/v1/JacksonMessageWriter.scala
      core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala
      resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/SparkKubernetesClientFactory.scala
      sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileDataSourceV2.scala
      sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Utils.scala
      sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/RebaseDateTime.scala 

      can find hotpath and fix them

      Attachments

        Activity

          People

            LuciferYang Yang Jie
            LuciferYang Yang Jie
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: