Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
Repro,
library(sparklyr)
library(arrow)
sc <- spark_connect(master = "local")
sdf_len(sc, 10^5) %>% dplyr::mutate(batch = id %% 10)
produces using Arrow 0.12, no repro under Arrow 0.11.
*** caught segfault *** address 0x10, cause 'memory not mapped' Traceback: 1: RecordBatch__to_dataframe(x, use_threads = use_threads) 2: `as_tibble.arrow::RecordBatch`(record_entry) 3: tibble::as_tibble(record_entry) 4: arrow_read_stream(.) 5: function_list[[i]](value) 6: freduce(value, `_function_list`) 7: `_fseq`(`_lhs`) 8: eval(quote(`_fseq`(`_lhs`)), env, env) 9: eval(quote(`_fseq`(`_lhs`)), env, env) 10: withVisible(eval(quote(`_fseq`(`_lhs`)), env, env)) 11: invoke_static(sc, "sparklyr.ArrowConverters", "toArrowBatchRdd", sdf, session, time_zone) %>% arrow_read_stream() %>% dplyr::bind_rows() 12: arrow_collect(object, ...)
Notice that the following cast is unsupported, I can add a test if someone can come up with a way of creating a decimal type.
batch <- table(tibble::tibble(x = 1:10))
batch$cast(schema(x = decimal()))
Error in Decimal128Type__initialize(precision, scale) : argument "precision" is missing, with no default
I'll send a PR with a fix...
Attachments
Issue Links
- links to