Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
Description
Currently Arrow joins with data that contain a list column errors, even when the list column is not a join key. Here's an example using the R bindings:
library(arrow) library(dplyr) jedi <- data.frame(name = c("C-3PO", "Luke Skywalker"), jedi = c(FALSE, TRUE)) arrow_table(starwars) %>% left_join(jedi) %>% collect() #> Error in `handle_csv_read_error()`: #> ! Invalid: Data type list<item: string> is not supported in join non-key field
The ability to join would be a useful enhancement for workflows with tabular data where list columns can be common, and for geospatial workflows where geometry columns are stored as list or fixed_size_list (thanks paleolimbot for mentioning that use case).
Related discussion here: ARROW-14519
Attachments
Issue Links
- relates to
-
ARROW-14519 [C++] joins segfault when data contains list column
- Resolved