Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
It looks like extension types are not supported in joins (even if the underlying type is supproted)! Reported by jonkeane while making a demo for Arrow + Query engine + geoarrow (R package), which uses extension types liberally:
library(arrow, warn.conflicts = FALSE) library(dplyr, warn.conflicts = FALSE) rb_non_ext <- record_batch( a = 1:5, b = letters[1:5] ) rb_ext_storage <- record_batch( b = letters[1:5], c = Array$create(list(as.raw(1:5)), type = binary()) ) rb_ext <- record_batch( b = letters[1:5], c = vctrs_extension_array(rb_ext_storage$c$as_vector()) ) rb_non_ext %>% left_join(rb_ext_storage) %>% collect() #> # A tibble: 5 × 3 #> a b c #> <int> <chr> <arrw_bnr> #> 1 1 a 01, 02, 03, 04, 05 #> 2 2 b 01, 02, 03, 04, 05 #> 3 3 c 01, 02, 03, 04, 05 #> 4 4 d 01, 02, 03, 04, 05 #> 5 5 e 01, 02, 03, 04, 05 rb_non_ext %>% left_join(rb_ext) %>% collect() #> Error in `collect()`: #> ! Invalid: Data type <arrow_binary[0]> is not supported in join non-key field #> /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/compute/exec/hash_join_node.cc:121 ValidateSchemas(join_type, left_schema, left_keys, left_output, right_schema, right_keys, right_output, left_field_name_suffix, right_field_name_suffix) #> /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/compute/exec/hash_join_node.cc:499 schema_mgr->Init( join_options.join_type, left_schema, join_options.left_keys, join_options.left_output, right_schema, join_options.right_keys, join_options.right_output, join_options.filter, join_options.output_suffix_for_left, join_options.output_suffix_for_right)
Attachments
Issue Links
- links to