Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-6923

[C++] Option for Filter kernel how to handle nulls in the selection vector

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • None
    • None
    • C++
    • None

    Description

      How nulls are handled in the boolean mask (selection vector) in a filter kernel varies between languages / data analytics systems (e.g. base R propagates nulls, dplyr R skips (sees as False), SQL generally skips them as well I think, Julia raises an error).

      Currently, in Arrow C++ we "propagate" nulls (null in the selection vector gives a null in the output):

      In [7]: arr = pa.array([1, 2, 3]) 
      
      In [8]: mask = pa.array([True, False, None]) 
      
      In [9]: arr.filter(mask) 
      Out[9]: 
      <pyarrow.lib.Int64Array object at 0x7fefe44b3048>
      [
        1,
        null
      ]
      

      Given the different ways this could be done (propagate, skip, error), should we provide an option to control this behaviour?

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              jorisvandenbossche Joris Van den Bossche
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: