[ARROW-17634] pyarrow.fs import reserves large amount of memory - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 9.0.0
Fix Version/s: 10.0.1
Component/s: None
Labels:
None

External issue URL:
https://github.com/apache/arrow/issues/32877

Description

It seems that in version 9.0.0 `import pyarrow.fs` reserves 1+ (close to 2) gigs of virtual memory, this was not present in 8.0.0

Test code:

def memory_snapshot(label=''):
   from util.System import System
   rss = System.process_rss_gigabytes()
   vms = _max = System.process_gigabytes()
   _max = System.process_max_gigabytes()
   print("Memory snapshot (%s); rss=%.1f vms=%.1f max=%.1f GB" % (label, rss, vms, _max))

memory_snapshot()
import pyarrow
print(pyarrow.__version__)
memory_snapshot()
import pyarrow.fs
memory_snapshot()

8.0.0 output

Memory snapshot (); rss=0.1 vms=0.4 max=0.4 GB
8.0.0
Memory snapshot (); rss=0.1 vms=0.5 max=0.6 GB
Memory snapshot (); rss=0.1 vms=0.5 max=0.6 GB

9.0.0 output

Memory snapshot (); rss=0.1 vms=0.4 max=0.4 GB
9.0.0
Memory snapshot (); rss=0.1 vms=0.5 max=0.6 GB
Memory snapshot (); rss=0.2 vms=2.2 max=2.3 GB

digging further into what happens during import, it seems `initialize_s3` is what is the culprit.

before s3 initialize
Memory snapshot (); rss=0.1 vms=0.5 max=0.6 GB
after s3 initialize
Memory snapshot (); rss=0.2 vms=2.2 max=2.3 GB

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: James Coder

Votes:: 2 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 06/Sep/22 16:08

Updated:: 11/Jan/23 11:52

Resolved:: 12/Dec/22 18:56