Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
>>> import pyarrow as pa Registry has 519 pre-compiled functions >>> import pandas as pd >>> import numpy as np >>> import pyarrow.gandiva as gandiva >>> import timeit >>> >>> from matplotlib import pyplot as plt >>> for scale in range(25, 26): ... frame_data = 1.0 * np.random.randint(0, 100, size=(2**scale, 2)) ... df = pd.DataFrame(frame_data).add_prefix("col") ... table = pa.Table.from_pandas(df) ... >>> >>> def float64_add(table): ... builder = gandiva.TreeExprBuilder() ... node_a = builder.make_field(table.schema.field_by_name("col0")) ... node_b = builder.make_field(table.schema.field_by_name("col1")) ... sum = builder.make_function(b"add", [node_a, node_b], pa.float64()) ... field_result = pa.field("c", pa.float64()) ... expr = builder.make_expression(sum, field_result) ... projector = gandiva.make_projector(table.schema, [expr], pa.default_memory_pool()) ... return projector ... >>> projector = float64_add(table) >>> projector.evaluate(table.to_batches()[0]) [1] 36393 segmentation fault python
It is because there is an integer overflow in Gandiva:
https://github.com/apache/arrow/blob/1a6545aa51f5f41f0233ee0a11ef87d21127c5ed/cpp/src/gandiva/projector.cc#L141
It should be `int64_t` instead of `int`.
Attachments
Issue Links
- links to