Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
Working on integration with DuckDB, we ran into an issue where it looks like errors are not being propagated fully/correctly with record batch readers using the C-interface. The DuckDB issue where this came up is https://github.com/duckdb/duckdb/issues/2055
In the example I'm passing a dataset with either one or two files from R to python. I've specifically mis-specified the schema to get an error The one file version works like I expect percolating the error up:
> library("arrow") > > venv <- try(reticulate::virtualenv_create("arrow-test")) virtualenv: arrow-test > install_pyarrow("arrow-test", nightly = TRUE) [output from installing pyarrow ...] > reticulate::use_virtualenv("arrow-test") > > file <- "arrow/r/inst/v0.7.1.parquet" > arrow_table <- arrow::open_dataset(rep(file, 1), schema(x=arrow::null())) > > scan <- Scanner$create(arrow_table) > reader <- scan$ToRecordBatchReader() > pyreader <- reticulate::r_to_py(reader) > pytab <- pyreader$read_all() Error in py_call_impl(callable, dots$args, dots$keywords) : OSError: NotImplemented: Unsupported cast from double to null using function cast_null Detailed traceback: File "pyarrow/ipc.pxi", line 563, in pyarrow.lib.RecordBatchReader.read_all File "pyarrow/error.pxi", line 114, in pyarrow.lib.check_status
But when having 2 (or more) files, the process hangs reading all of the batches:
> library("arrow") > > venv <- try(reticulate::virtualenv_create("arrow-test")) virtualenv: arrow-test > install_pyarrow("arrow-test", nightly = TRUE) [output from installing pyarrow ...] > reticulate::use_virtualenv("arrow-test") > > file <- "arrow/r/inst/v0.7.1.parquet" > arrow_table <- arrow::open_dataset(rep(file, 2), schema(x=arrow::null())) > > scan <- Scanner$create(arrow_table) > reader <- scan$ToRecordBatchReader() > pyreader <- reticulate::r_to_py(reader) > pytab <- pyreader$read_all() {hangs forever here}
Attachments
Issue Links
- links to