Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-10663

[C++/Doc] The IsIn kernel ignores the skip_nulls option of SetLookupOptions

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 3.0.0
    • C++

    Description

      The C++ docs of SetLookupOptions has this explanation of the skip_nulls option:

        /// Whether nulls in `value_set` count for lookup.
        ///
        /// If true, any null in `value_set` is ignored and nulls in the input
        /// produce null (IndexIn) or false (IsIn) values in the output.
        /// If false, any null in `value_set` is successfully matched in
        /// the input.
        bool skip_nulls;
      

      (from https://github.com/apache/arrow/blob/8b9f6b9d28b4524724e60fac589fb1a3552a32b4/cpp/src/arrow/compute/api_scalar.h#L78-L84)

      However, for IsIn this explanation doesn't seem to hold in practice:

      In [16]: arr = pa.array([1, 2, None])
      
      In [17]: pc.is_in(arr, value_set=pa.array([1, None]), skip_null=True)
      Out[17]: 
      <pyarrow.lib.BooleanArray object at 0x7fcf666f9408>
      [
        true,
        false,
        true
      ]
      
      In [18]: pc.is_in(arr, value_set=pa.array([1, None]), skip_null=False)
      Out[18]: 
      <pyarrow.lib.BooleanArray object at 0x7fcf666b13a8>
      [
        true,
        false,
        true
      ]
      

      This documentation was added in https://github.com/apache/arrow/pull/7695 (ARROW-8989)/
      .

      BTW, for "index_in", it works as documented:

      In [19]: pc.index_in(arr, value_set=pa.array([1, None]), skip_null=True)
      Out[19]: 
      <pyarrow.lib.Int32Array object at 0x7fcf666f04c8>
      [
        0,
        null,
        null
      ]
      
      In [20]: pc.index_in(arr, value_set=pa.array([1, None]), skip_null=False)
      Out[20]: 
      <pyarrow.lib.Int32Array object at 0x7fcf666f0ee8>
      [
        0,
        null,
        1
      ]
      

      Attachments

        Issue Links

          Activity

            People

              apitrou Antoine Pitrou
              jorisvandenbossche Joris Van den Bossche
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2h 20m
                  2h 20m