Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-15730

[R] Memory usage in R blows up

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Cannot Reproduce
    • 6.0.1
    • None
    • R
    • None

    Description

      Hi,

      I'm trying to load a ~10gb arrow file into R (under Windows)

      (The file is generated in the 6.0.1 arrow version under Linux).

      For whatever reason the memory usage blows up to ~110-120gb (in a fresh and empty R instance).

      The weird thing is that when deleting the object again and running a gc() the memory usage goes down to 90gb only. The delta of ~20-30gb is what I would have expected the dataframe to use up in memory (and that's also approx. what was used - in total during the load - when running the old arrow version of 0.15.1. And it is also what R shows me when just printing the object size.)

      The commands I'm running are simply:

      options(arrow.use_threads=FALSE);

      arrow::set_cpu_count(1); # need this - otherwise it freezes under windows

      arrow::read_arrow('file.arrow5')

      Is arrow reserving some resources in the background and not giving them up again? Are there some settings I need to change for this?

      Is this something that is known and fixed in a newer version?

      Note that this doesn't happen in Linux. There all the resources are freed up when calling the gc() function - not sure if it matters but there I also don't need to set the cpu count to 1.

      Any help would be appreciated.

      Attachments

        Issue Links

          Activity

            People

              wjones127 Will Jones
              wjones127 Will Jones
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: