Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-4582

[C++/Python] Memory corruption on Pandas->Arrow conversion

    XMLWordPrintableJSON

    Details

      Description

      When converting DataFrames with numerical columns to Arrow tables we were seeing random segfaults in core Python code. This only happened in environments where we had a high level of parallelisation or slow code execution (e.g. in AddressSanitizer builds).

      The reason for these segfaults was that we were incrementing the reference count of the underlying NumPy buffer but were not holding the GIL while changing the reference count.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                uwe Uwe Korn
                Reporter:
                uwe Uwe Korn
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 20m
                  1h 20m