Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
Performance of joins slows down dramatically with smaller batches.
The issue is related to slow performance of MutableDataArray::new() when passed a high number of batches. This happens when passing in all of the batches from the build side of the join and this happens once per build-side join key for each probe-side batch.
It seems to get exponentially slower as the number of arrays increases even though the number of rows is the same.
I modified hash_join.rs to have this debug code:
let start = Instant::now(); let row_count: usize = arrays.iter().map(|arr| arr.len()).sum(); let num_arrays = arrays.len(); let mut mutable = MutableArrayData::new(arrays, true, capacity); if num_arrays > 0 { debug!("MutableArrayData::new() with {} arrays containing {} rows took {} ms", num_arrays, row_count, start.elapsed().as_millis()); }
Batch size 131072:
MutableArrayData::new() with 4584 arrays containing 3115341 rows took 1 ms MutableArrayData::new() with 4584 arrays containing 3115341 rows took 1 ms MutableArrayData::new() with 4584 arrays containing 3115341 rows took 1 ms
Batch size 16384:
MutableArrayData::new() with 36624 arrays containing 3115341 rows took 19 ms MutableArrayData::new() with 36624 arrays containing 3115341 rows took 16 ms MutableArrayData::new() with 36624 arrays containing 3115341 rows took 17 ms
Batch size 4096:
MutableArrayData::new() with 146496 arrays containing 3115341 rows took 88 ms MutableArrayData::new() with 146496 arrays containing 3115341 rows took 89 ms MutableArrayData::new() with 146496 arrays containing 3115341 rows took 88 ms
Attachments
Issue Links
- links to