Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.0.1, 3.1.0
Description
The idea is to improve the performance of HybridStore by adding batch write support to LevelDB. https://issues.apache.org/jira/browse/SPARK-31608 introduces HybridStore. HybridStore will write data to InMemoryStore at first and use a background thread to dump data to LevelDB once the writing to InMemoryStore is completed. In the comments section of https://github.com/apache/spark/pull/28412, Mridul Muralidharan mentioned using batch writing can improve the performance of this dumping process and he wrote the code of writeAll().
I did the comparison of the HybridStore switching time between one-by-one write and batch write on an HDD disk. When the disk is free, the batch-write has around 25% improvement, and when the disk is 100% busy, the batch-write has 7x - 10x improvement.
when the disk is at 0% utilization:
log size, jobs and tasks per job | original switching time, with write() | switching time with writeAll() |
---|---|---|
133m, 400 jobs, 100 tasks per job | 16s | 13s |
265m, 400 jobs, 200 tasks per job | 30s | 23s |
1.3g, 1000 jobs, 400 tasks per job | 136s | 108s |
when the disk is at 100% utilization:
log size, jobs and tasks per job | original switching time, with write() | switching time with writeAll() |
---|---|---|
133m, 400 jobs, 100 tasks per job | 116s | 17s |
265m, 400 jobs, 200 tasks per job | 251s | 26s |
I also ran some write related benchmarking tests on LevelDBBenchmark.java and measured the total time of writing 1024 objects.
when the disk is at 0% utilization:
Benchmark test | with write(), ms | with writeAll(), ms |
---|---|---|
randomUpdatesIndexed | 213.060 | 157.356 |
randomUpdatesNoIndex | 57.869 | 35.439 |
randomWritesIndexed | 298.854 | 229.274 |
randomWritesNoIndex | 66.764 | 38.361 |
sequentialUpdatesIndexed | 87.019 | 56.219 |
sequentialUpdatesNoIndex | 61.851 | 41.942 |
sequentialWritesIndexed | 94.044 | 56.534 |
sequentialWritesNoIndex | 118.345 | 66.483 |
when the disk is at 50% utilization:
Benchmark test | with write(), ms | with writeAll(), ms |
---|---|---|
randomUpdatesIndexed | 230.386 | 180.817 |
randomUpdatesNoIndex | 58.935 | 50.113 |
randomWritesIndexed | 315.241 | 254.400 |
randomWritesNoIndex | 96.709 | 41.164 |
sequentialUpdatesIndexed | 89.971 | 70.387 |
sequentialUpdatesNoIndex | 72.021 | 53.769 |
sequentialWritesIndexed | 103.052 | 67.358 |
sequentialWritesNoIndex | 76.194 | 99.037 |
Attachments
Issue Links
- is related to
-
SPARK-31608 Add a hybrid KVStore to make UI loading faster
- Resolved
- links to