Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-3698

[C++] Segmentation fault when using a large table in Gandiva

    XMLWordPrintableJSON

Details

    Description

      >>> import pyarrow as pa
      Registry has 519 pre-compiled functions
      >>> import pandas as pd
      >>> import numpy as np
      >>> import pyarrow.gandiva as gandiva
      >>> import timeit
      >>>
      >>> from matplotlib import pyplot as plt
      >>> for scale in range(25, 26):
      ... frame_data = 1.0 * np.random.randint(0, 100, size=(2**scale, 2))
      ... df = pd.DataFrame(frame_data).add_prefix("col")
      ... table = pa.Table.from_pandas(df)
      ...
      >>>
      >>> def float64_add(table):
      ... builder = gandiva.TreeExprBuilder()
      ... node_a = builder.make_field(table.schema.field_by_name("col0"))
      ... node_b = builder.make_field(table.schema.field_by_name("col1"))
      ... sum = builder.make_function(b"add", [node_a, node_b], pa.float64())
      ... field_result = pa.field("c", pa.float64())
      ... expr = builder.make_expression(sum, field_result)
      ... projector = gandiva.make_projector(table.schema, [expr], pa.default_memory_pool())
      ... return projector
      ...
      >>> projector = float64_add(table)
      >>> projector.evaluate(table.to_batches()[0])
      [1] 36393 segmentation fault python

      It is because there is an integer overflow in Gandiva:
      https://github.com/apache/arrow/blob/1a6545aa51f5f41f0233ee0a11ef87d21127c5ed/cpp/src/gandiva/projector.cc#L141

      It should be `int64_t` instead of `int`.

      Attachments

        Issue Links

          Activity

            People

              suquark Siyuan Zhuang
              suquark Siyuan Zhuang
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 5h 20m
                  5h 20m