Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-10197

[Gandiva][python] Execute expression on filtered data

Details

    Description

      Looks like there is no way to execute an expression on filtered data in python.
      Basically, I cannot pass `SelectionVector` to projector's `evaluate` method

      ```python
      import pyarrow as pa
      import pyarrow.gandiva as gandiva

      table = pa.Table.from_arrays([pa.array([1., 31., 46., 3., 57., 44., 22.]),
                                        pa.array([5., 45., 36., 73.,
                                                  83., 23., 76.])],
                                       ['a', 'b'])

      builder = gandiva.TreeExprBuilder()
      node_a = builder.make_field(table.schema.field("a"))
      node_b = builder.make_field(table.schema.field("b"))
      fifty = builder.make_literal(50.0, pa.float64())
      eleven = builder.make_literal(11.0, pa.float64())

      cond_1 = builder.make_function("less_than", [node_a, fifty], pa.bool_())
      cond_2 = builder.make_function("greater_than", [node_a, node_b],
                                         pa.bool_())
      cond_3 = builder.make_function("less_than", [node_b, eleven], pa.bool_())
      cond = builder.make_or([builder.make_and([cond_1, cond_2]), cond_3])
      condition = builder.make_condition(cond)

      filter = gandiva.make_filter(table.schema, condition)

      filterResult = filter.evaluate(table.to_batches()[0], pa.default_memory_pool()) --> filterResult has type SelectionVector
      print(result)

      sum = builder.make_function("add", [node_a, node_b], pa.float64())
      field_result = pa.field("c", pa.float64())
      expr = builder.make_expression(sum, field_result)
      projector = gandiva.make_projector(
      table.schema, [expr], pa.default_memory_pool())

      r, = projector.evaluate(table.to_batches()[0], result) --> Here there is a problem that I don't know how to use filterResult with projector
      ```

      In C++, I see that it is possible to pass SelectionVector as second argument to projector::Evaluate: https://github.com/apache/arrow/blob/c5fa23ea0e15abe47b35524fa6a79c7b8c160fa0/cpp/src/gandiva/tests/filter_project_test.cc#L270
       
      Meanwhile, it looks like it is impossible in `gandiva.pyx`: https://github.com/apache/arrow/blob/a4eb08d54ee0d4c0d0202fa0a2dfa8af7aad7a05/python/pyarrow/gandiva.pyx#L154

      Attachments

        Issue Links

          Activity

            People

              klykov Kirill Lykov
              klykov Kirill Lykov
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1.5h
                  1.5h

                  Slack

                    Issue deployment