Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-7520

[R] Writing many batches causes a crash

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Trivial
    • Resolution: Fixed
    • 0.15.1
    • 0.17.0
    • R
    • None

    Description

      Hi,

      When creating north of 200-300 batches, the writing to the arrow file crashes R - it doesn't even show an error message. Rstudio just aborts.

      I have the feeling that maybe each batch becomes a stream and R has issues with the connections, but that's a total guess.

      Any help would be appreciated.

       

      ##

       

      Here is the function. When running it with 3000 it crashes immediately.

      Before that I ran it with 100, and then increased it slowly, and then it randomly crashed again.

       

      ##

      Now I received this error message after writing 30 batches.

      Error in ipc__RecordBatchWriter_WriteRecordBatch(self, batch) :
      Invalid: Invalid operation on closed file
      Error in ipc__RecordBatchWriter_WriteRecordBatch(self, batch) :
      Invalid: Invalid operation on closed file

      ##

      write_arrow_custom(data.frame(A=c(1:100000),B=c(1:100000)),'C:/Temp/test.arrow',3000)

       

      write_arrow_custom <- function(df,targetarrow,nrbatches) {

        ct <- nrbatches

        idxs <- c(0:ct)/ct*nrow(df)

        idxs <- round(idxs,0) %>% as.integer()

        idxs[length(idxs)] <- nrow(df)

        df_nav <- idxs %>% as.data.frame() %>% rename(colfrom=1) %>% mutate(colto=lead(colfrom)) %>% mutate(colfrom=colfrom+1) %>% filter(!is.na(colto)) %>% mutate(R=row_number())

        stopifnot(df_nav %>% mutate(chk=colto-colfrom+1) %>% '$'('chk') %>% sum()==nrow(df))

        table_df <- Table$create(name=rownames(df[1,]),df[1,])

        writer <- RecordBatchFileWriter$create(targetarrow,table_df$schema)

        df_nav %>% dlply(c('R'),function(df_nav)

      {     catl(glue('

      {df_nav$colfrom[1]}

      :

      {df_nav$colto[1]}

      /

      {df_nav$R[1]}

      ...'))

          tmp <- df[df_nav$colfrom[1]:df_nav$colto[1],]

          writer$write_batch(record_batch(name = rownames(tmp), tmp))

          NULL

        }) -> batch_lst

        writer$close()

        rm(batch_lst)

        gc()

      }

       

       

       

      Attachments

        Issue Links

          Activity

            People

              npr Neal Richardson
              Klar Christian
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: