Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
I used the test case in https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_gandiva.py#L25, and found an issue when I was using the slice operator input_batch[1:]. It seems that the offset is ignored in the Gandiva projector.
import pyarrow as pa import pyarrow.gandiva as gandiva builder = gandiva.TreeExprBuilder() field_a = pa.field('a', pa.int32()) field_b = pa.field('b', pa.int32()) schema = pa.schema([field_a, field_b]) field_result = pa.field('res', pa.int32()) node_a = builder.make_field(field_a) node_b = builder.make_field(field_b) condition = builder.make_function("greater_than", [node_a, node_b], pa.bool_()) if_node = builder.make_if(condition, node_a, node_b, pa.int32()) expr = builder.make_expression(if_node, field_result) projector = gandiva.make_projector( schema, [expr], pa.default_memory_pool()) a = pa.array([10, 12, -20, 5], type=pa.int32()) b = pa.array([5, 15, 15, 17], type=pa.int32()) e = pa.array([10, 15, 15, 17], type=pa.int32()) input_batch = pa.RecordBatch.from_arrays([a, b], names=['a', 'b']) r, = projector.evaluate(input_batch[1:]) print(r)
If we use the full record batch input_batch, the expected output is [10, 15, 15, 17]. So if we use input_batch[1:], the expected output should be [15, 15, 17], however this script returned [10, 15, 15]. It seems that the projector ignores the offset and always reads from 0.
A corresponding issue is created in GitHub as well https://github.com/apache/arrow/issues/4420
Attachments
Issue Links
- links to