Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-16695

[R][Python][C++] Extension types are not supported in joins

    XMLWordPrintableJSON

Details

    Description

      It looks like extension types are not supported in joins (even if the underlying type is supproted)! Reported by jonkeane while making a demo for Arrow + Query engine + geoarrow (R package), which uses extension types liberally:

      library(arrow, warn.conflicts = FALSE)
      library(dplyr, warn.conflicts = FALSE)
      
      rb_non_ext <- record_batch(
        a = 1:5, 
        b = letters[1:5]
      )
      
      rb_ext_storage <- record_batch(
        b = letters[1:5],
        c = Array$create(list(as.raw(1:5)), type = binary())
      )
      
      rb_ext <- record_batch(
        b = letters[1:5],
        c = vctrs_extension_array(rb_ext_storage$c$as_vector())
      )
      
      rb_non_ext %>% 
        left_join(rb_ext_storage) %>% 
        collect()
      #> # A tibble: 5 × 3
      #>       a b                      c
      #>   <int> <chr>         <arrw_bnr>
      #> 1     1 a     01, 02, 03, 04, 05
      #> 2     2 b     01, 02, 03, 04, 05
      #> 3     3 c     01, 02, 03, 04, 05
      #> 4     4 d     01, 02, 03, 04, 05
      #> 5     5 e     01, 02, 03, 04, 05
      
      rb_non_ext %>% 
        left_join(rb_ext) %>% 
        collect()
      #> Error in `collect()`:
      #> ! Invalid: Data type <arrow_binary[0]> is not supported in join non-key field
      #> /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/compute/exec/hash_join_node.cc:121  ValidateSchemas(join_type, left_schema, left_keys, left_output, right_schema, right_keys, right_output, left_field_name_suffix, right_field_name_suffix)
      #> /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/compute/exec/hash_join_node.cc:499  schema_mgr->Init( join_options.join_type, left_schema, join_options.left_keys, join_options.left_output, right_schema, join_options.right_keys, join_options.right_output, join_options.filter, join_options.output_suffix_for_left, join_options.output_suffix_for_right)
      

      Attachments

        Issue Links

          Activity

            People

              rokm Rok Mihevc
              paleolimbot Dewey Dunnington
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 3h 20m
                  3h 20m