Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Won't Fix
-
1.10.0
-
None
Description
Benchmark code:
test("Parquet write benchmark") { val count = 100 * 1024 * 1024 val numIters = 5 withTempPath { path => val benchmark = new Benchmark(s"Parquet write benchmark ${spark.sparkContext.version}", 5) Seq("long", "string", "decimal(18, 0)", "decimal(38, 18)").foreach { dt => benchmark.addCase(s"$dt type", numIters = numIters) { iter => spark.range(count).selectExpr(s"cast(id as $dt) as id") .write.mode("overwrite").parquet(path.getAbsolutePath) } } benchmark.run() } }
Result:
-- Spark 2.3.3-SNAPSHOT with Parquet 1.8.3 Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6 Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz Parquet write benchmark 2.3.3-SNAPSHOT: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ long type 10963 / 11344 0.0 2192675973.8 1.0X string type 28423 / 29437 0.0 5684553922.2 0.4X decimal(18, 0) type 11558 / 11696 0.0 2311587203.6 0.9X decimal(38, 18) type 43858 / 44432 0.0 8771537663.4 0.2X -- Spark 2.4.0-SNAPSHOT with Parquet 1.10.0 Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6 Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz Parquet write benchmark 2.4.0-SNAPSHOT: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ long type 11633 / 12070 0.0 2326572295.8 1.0X string type 31374 / 32178 0.0 6274760187.4 0.4X decimal(18, 0) type 13019 / 13294 0.0 2603841925.4 0.9X decimal(38, 18) type 50719 / 50983 0.0 10143775007.6 0.2X
The mainly affects the performance is toByteBuffer.
If don't use the toByteBuffer when compare binary, the result is:
-- Spark 2.4.0-SNAPSHOT with Parquet 1.10.0 Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6 Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz Parquet write benchmark 2.4.0-SNAPSHOT: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ long type 11171 / 11508 0.0 2234189382.0 1.0X string type 30072 / 30290 0.0 6014346455.4 0.4X decimal(18, 0) type 12150 / 12239 0.0 2430052708.8 0.9X decimal(38, 18) type 44974 / 45423 0.0 8994773738.8 0.2X
Attachments
Attachments
Issue Links
- links to