Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-15731

[C++] Enable joins when data contains a list column

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • C++

    Description

      Currently Arrow joins with data that contain a list column errors, even when the list column is not a join key. Here's an example using the R bindings:

      library(arrow)
      library(dplyr)
      
      jedi <- data.frame(name = c("C-3PO", "Luke Skywalker"),
                         jedi = c(FALSE, TRUE))
      
      arrow_table(starwars) %>%
        left_join(jedi) %>%
        collect()
      #> Error in `handle_csv_read_error()`:
      #> ! Invalid: Data type list<item: string> is not supported in join non-key field
      

      The ability to join would be a useful enhancement for workflows with tabular data where list columns can be common, and for geospatial workflows where geometry columns are stored as list or fixed_size_list (thanks paleolimbot for mentioning that use case).

      Related discussion here: ARROW-14519

       

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              stephhazlitt Stephanie Hazlitt
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated: