Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-11846

[Rust] Specify behavior of filter kernel on `null`

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Invalid
    • None
    • None
    • Rust
    • None

    Description

      Currently, the behavior of `filter` is undefined on null values.

      This leads to a few issues in cases where you may have a `boolean` array containing `null` values. For instance, I created a `null_to_false` which has to manipulate the underlying buffers in order to combine the null-bits with false. The C++ `filter` kernel allows specifying the behavior on nulls. Thoughts on adding a method that takes an additional parameter to configure the behavior, and then picking a "default" behavior for the existing implementation?

      pub enum NullFilterBehavior {
        // Include values where the filter was NULL.
        EMIT,
        // Exclude values where the filter was NULL.
        SKIP,
        // Ignore the null bits. Behavior is undefined.
        UNDEFINED,
      }
      
      pub struct FilterConfig {
        null_behavior: NullFilterBehavior
      }
      
      impl Default for FilterConfig {
        fn default() -> Self {
          Self {
            null_behavior: NullFilterBehavior::UNDEFINED,
          }
        }
      }
      
      pub fn filter(array: &Array, filter: &BooleanArray) -> Result<ArrayRef> {
        filter_config(array, filter, FilterConfig::default()
      }
      
      pub fn filter(array: &Array, filter: &BooleanArray, config: FilterConfig) -> Result<ArrayRef> {
       ...
      }
      

      It seems like implementing such a method could be done by allowing the BitChunksIterator to AND / OR each of the chunks before passing it to the BitSlices iterator.

      Attachments

        Activity

          People

            Unassigned Unassigned
            bchambers Ben Chambers
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: