Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-17216

[C++] Support joining tables with non-key fields as list

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • C++

    Description

      I am trying to join 2 Arrow tables where some columns are of list<float> data type. Note that my join columns/keys are primitive data types and some my non-join columns/keys are of list<float>. But, PyArrow join() cannot join such as table, although pandas can. It says

      ArrowInvalid: Data type list<item: float> is not supported in join non-key field

      when I execute this piece of code

      joined_table = table_1.join(table_2, ['k1', 'k2', 'k3'])

      A stackoverflow response pointed out that Arrow currently cannot handle non-fixed types for joins. Can this be fixed ? Or is this intentional ?

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              heyjc Jayjeet Chakraborty
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: