Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-17634

pyarrow.fs import reserves large amount of memory

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 9.0.0
    • 10.0.1
    • None
    • None

    Description

      It seems that in version 9.0.0 `import pyarrow.fs` reserves 1+ (close to 2) gigs of virtual memory, this was not present in 8.0.0

      Test code:

      def memory_snapshot(label=''):
         from util.System import System
         rss = System.process_rss_gigabytes()
         vms = _max = System.process_gigabytes()
         _max = System.process_max_gigabytes()
         print("Memory snapshot (%s); rss=%.1f vms=%.1f max=%.1f GB" % (label, rss, vms, _max))
      
      memory_snapshot()
      import pyarrow
      print(pyarrow.__version__)
      memory_snapshot()
      import pyarrow.fs
      memory_snapshot()
      

      8.0.0 output

      Memory snapshot (); rss=0.1 vms=0.4 max=0.4 GB
      8.0.0
      Memory snapshot (); rss=0.1 vms=0.5 max=0.6 GB
      Memory snapshot (); rss=0.1 vms=0.5 max=0.6 GB
      

      9.0.0 output

      Memory snapshot (); rss=0.1 vms=0.4 max=0.4 GB
      9.0.0
      Memory snapshot (); rss=0.1 vms=0.5 max=0.6 GB
      Memory snapshot (); rss=0.2 vms=2.2 max=2.3 GB
      

      digging further into what happens during import, it seems `initialize_s3` is what is the culprit.

      before s3 initialize
      Memory snapshot (); rss=0.1 vms=0.5 max=0.6 GB
      after s3 initialize
      Memory snapshot (); rss=0.2 vms=2.2 max=2.3 GB
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            Jcoder James Coder
            Votes:
            2 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: