Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-5471

[C++][Gandiva]Array offset is ignored in Gandiva projector

    XMLWordPrintableJSON

Details

    Description

      I used the test case in https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_gandiva.py#L25, and found an issue when I was using the slice operator input_batch[1:]. It seems that the offset is ignored in the Gandiva projector.

      import pyarrow as pa
      import pyarrow.gandiva as gandiva
      
      builder = gandiva.TreeExprBuilder()
      
      field_a = pa.field('a', pa.int32())
      field_b = pa.field('b', pa.int32())
      
      schema = pa.schema([field_a, field_b])
      
      field_result = pa.field('res', pa.int32())
      
      node_a = builder.make_field(field_a)
      node_b = builder.make_field(field_b)
      
      condition = builder.make_function("greater_than", [node_a, node_b],
      pa.bool_())
      if_node = builder.make_if(condition, node_a, node_b, pa.int32())
      
      expr = builder.make_expression(if_node, field_result)
      
      projector = gandiva.make_projector(
      schema, [expr], pa.default_memory_pool())
      
      a = pa.array([10, 12, -20, 5], type=pa.int32())
      b = pa.array([5, 15, 15, 17], type=pa.int32())
      e = pa.array([10, 15, 15, 17], type=pa.int32())
      input_batch = pa.RecordBatch.from_arrays([a, b], names=['a', 'b'])
      
      r, = projector.evaluate(input_batch[1:])
      print(r)
      

      If we use the full record batch input_batch, the expected output is [10, 15, 15, 17]. So if we use input_batch[1:], the expected output should be [15, 15, 17], however this script returned [10, 15, 15]. It seems that the projector ignores the offset and always reads from 0.

       

      A corresponding issue is created in GitHub as well https://github.com/apache/arrow/issues/4420

      Attachments

        Issue Links

          Activity

            People

              zeyuanxy Zeyuan Shang
              zeyuanxy Zeyuan Shang
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 4h
                  4h