Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-6570

[Python] Use MemoryPool to allocate memory for NumPy arrays in to_pandas calls

    XMLWordPrintableJSON

    Details

      Description

      It occurred to me that we can likely improve the performance and scalability of Table.to_pandas or other to_pandas methods by using the active MemoryPool to allocate memory for the array rather than letting NumPy use the system allocator. We would need to use the PyCapsule approach to setting a shared_ptr<Buffer> as the base of the created NumPy arrays

      This has the additional benefit of tracking NumPy-related allocations in the MemoryPool so we will have a more precise accounting of allocated memory.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                wesm Wes McKinney
                Reporter:
                wesm Wes McKinney
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 0.5h
                  0.5h