Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-13013

[C++][Compute][Python] Move (majority of) kernel unit tests to python



    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • C++, Python
    • None


      mailing list discussion: https://lists.apache.org/thread.html/r09e0e0fbb8b655bbec8cf5662d224f3dfc4fba894a312900f73ae3bf%40%3Cdev.arrow.apache.org%3E

      Writing unit tests for compute functions in c++ is laborious, entails a lot of boilerplate, and slows iteration since it requires recompilation when adding new tests. The majority of these test cases need not be written in C++ at all and could instead be made part of the pyarrow test suite.

      In order to make the kernels' C++ implementations easily debuggable from unit tests, we'll have to expose a c++ function named AssertCallFunction or so. AssertCallFunction will invoke the named compute::Function and compare actual results to expected without crossing the C++/python boundary, allowing a developer to step through all relevant code with a single breakpoint in GDB. Construction of scalars/arrays/function options and any other inputs to the function is amply supported by pyarrow, and will happen outside the scope of AssertCallFunction.

      AssertCallFunction should not try to derive additional assertions from its arguments - for example {{CheckScalar("add",

      {left, right}

      , expected)}} will first assert that left + right == expected then left.slice(1) + right.slice(1) == expected.slice(1) to ensure that offsets are handled correctly. This has value but can be easily expressed in Python and configuration of such behavior would overcomplicate the interface of AssertCallFunction.

      Unit tests for kernels would then be written in arrow/python/pyarrow/test/kernels/test_*.py. The C++ unit test for addition with implicit casts could then be rewritten as

      def test_addition_implicit_casts():
          AssertCallFunction("add", [[ 0,    1,   2,    None],
                                     [ 0.25, 1.5, 2.75, None]],
                             expected=[0.25, 1.5, 2.75, None])
          # ...

      NB: Some unit tests will probably still reside in C++ since we'll need to test things we don't wish to expose in a user facing API, such as "whether a boolean kernel avoids clobbering bits when outputting into a slice". These should be far more manageable since they won't need to assert correct logic across all possible input types




            Unassigned Unassigned
            bkietz Ben Kietzman
            0 Vote for this issue
            8 Start watching this issue