Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-13681

[C++] list_parent_indices only computes for first chunk

    XMLWordPrintableJSON

Details

    Description

      Pyarrow version: 5.0.0. 
      Python version: 3.7.9

      I came across this issue due to very unexpected behaviour from the "explode" function obtained here:

      https://issues.apache.org/jira/browse/ARROW-12099
      indices = pc.list_parent_indices(table[col_name])

      if table[column] in this example contains several chunks, the indices will look perfectly fine for that chunk, but erratic and unexpected results for second chunk.
      No warning or info was given either

      A workaround that solved the problem for me is:

        indices = pc.list_parent_indices(table.combine_chunks()[col_name])
      

      The behaviour then changes dramatically.

      I'm assuming this isnt expected and should be fixed?

      Attachments

        Issue Links

          Activity

            People

              apitrou Antoine Pitrou
              TorMcK Tor Eivind McKenzie-Syvertsen
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 40m
                  40m