Details
-
New Feature
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
None
-
None
-
None
Description
Note: Not sure if this is a limitation of the R library or the underlying C++ code:
I have a ~30 gig arrow file with almost 1000 columns - it has 12,000 record batches of varying row sizes
1. Is it possible at to use read_arrow to filter out columns? (similar to how read_feather has a (col_select =... )
2. Or is it possible using RecordBatchFileReader to filter columns?
The only thing I seem to be able to do (please confirm if this is my only option) is loop over all record batches, select a single column at a time, and construct the data I need to pull out manually. ie like the following:
for(i in 0:data_rbfr$num_record_batches) { rbn <- data_rbfr$get_batch(i) if (i == 0) { merged <- as.data.frame(rbn$column(5)$as_vector()) } else { dfn <- as.data.frame(rbn$column(5)$as_vector()) merged <- rbind(merged,dfn) } print(paste(i, nrow(merged))) }