Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-15729

[R] Reading large files randomly freezes

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Duplicate
    • None
    • None
    • R
    • None

    Description

      Hi -

      I recently upgraded to Arrow 6.0.1 and am using it in R.

      Whenever reading a large file (~10gb) in Windows it randomly freezes sometimes. I can see the memory being allocated in the first 10-20 seconds, but then nothing happens and R just doesn't respond (the R process becomes idle too).

      I'm using the option options(arrow.use_threads=FALSE).

      I didn't have this issue with the previous version (0.15.1) I was using. And the file reads fine under Linux.

      I would post a reproducible example but it happens randomly. I even thought I would just read large files in pieces by first getting all the distinct sections of a specific column (with compute>collect) but that hangs too.

      Any ideas would be appreciated.

      Edit

      Not sure if it makes sense to anyone but after a few tries it seems that the issue only happens in Rstudio. In the R console it loads it fine. All I'm executing is the below.

      options(arrow.use_threads=FALSE)
      aa <- arrow::read_arrow('.../file.arrow5')

      One thing I want to point out that the underlying Rscript process under Rstudio seems to definitely use more than one core when executing the above.

      Edit2

      Using arrow::set_cpu_count(1) seems to solve the issue.

       

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              Klar Christian
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: