Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-9523

[Rust] improve performance of filter kernel

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.17.1
    • 2.0.0
    • Rust

    Description

      The filter kernel located here https://github.com/apache/arrow/blob/master/rust/arrow/src/compute/kernels/filter.rs

      currently has the following performance:

      filter old u8 low selectivity time: [1.7782 ms 1.7790 ms 1.7801 ms]
      filter old u8 high selectivity time: [815.58 us 816.58 us 817.57 us]
      filter old u8 w NULLs low selectivity time: [1.8131 ms 1.8231 ms 1.8336 ms]
      filter old u8 w NULLs high selectivity time: [817.41 us 820.01 us 823.05 us]

      I have been working on a new implementation which performs between approximately 14 and 480 times faster depending mostly on filter selectivity. Here are the benchmark results:

      filter u8 low selectivity time: [127.30 us 128.06 us 128.88 us]
      filter u8 high selectivity time: [5.4215 us 5.5778 us 5.7335 us]
      filter context u8 low selectivity time: [124.21 us 126.21 us 128.38 us]
      filter context u8 high selectivity time: [1.6707 us 1.7052 us 1.7476 us]
      filter context u8 w NULLs low selectivity time: [142.40 us 142.83 us 143.37 us]
      filter context u8 w NULLs high selectivity time: [2.3338 us 2.3788 us 2.4304 us]
      filter context f32 low selectivity time: [132.59 us 132.91 us 133.29 us]
      filter context f32 high selectivity time: [1.6864 us 1.7026 us 1.7212 us]

      This new implementation is based on a few key ideas:

      (1) if the data array being filtered doesn't have a null bitmap, no time should be wasted to copy or create a null bitmap in the resulting filtered data array - this is implemented using the CopyNullBit trait which has a no-op implementation and an actual implementation

      (2) when the filter is highly selective, e.g. only a small number of values from the data array are selected, the filter implementation should efficiently skip entire batches of 0s in the filter array - this is implemented by transmuting the filter array to u64 which allows to quickly check and skip entire batches of 64 bits 

      (3) when an entire record batch is filtered, any computation which only depends on the filter array is done once and then shared for filtering all the data arrays in the record batch - this is implemented using the FilterContext struct

       

      paddyhoran, andygrove let me know what you think

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              yordan-pavlov Yordan Pavlov
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 3h 20m
                  3h 20m