Details
-
Bug
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
None
-
None
-
None
Description
combine_chunks fails on column of table, but does not error on table itself (but creates 3 chunks instead).
Is there a reason why they are not handled the same?
In [90]: pa.__version__ Out[90]: '4.0.0' # Get shape In [85]: pa_table.shape Out[85]: (102753589, 1)In [86]: pa_col1_array = pa_table.column(0) # Get number of chunks In [87]: pa_col1_array.num_chunks Out[87]: 4404 # Combining chunks on the pyarrow table with one column works. In [88]: pa_table.combine_chunks() Out[88]: pyarrow.Table # id=TEW__014e25__c14e1d__Multiome_RNA_brain_10x_no_perm: string # Combining chunks on the column itself does not work. In [89]: pa_col1_array.combine_chunks() --------------------------------------------------------------------------- ArrowInvalid Traceback (most recent call last) <ipython-input-89-fdd0d0056a8e> in <module> ----> 1 pa_col1_array.combine_chunks() /software/miniconda3/envs/cisTopic/lib/python3.7/site-packages/pyarrow/table.pxi in pyarrow.lib.ChunkedArray.combine_chunks() /software/miniconda3/envs/cisTopic/lib/python3.7/site-packages/pyarrow/array.pxi in pyarrow.lib.concat_arrays() /software/miniconda3/envs/cisTopic/lib/python3.7/site-packages/pyarrow/error.pxi in pyarrow.lib.pyarrow_internal_check_status() /software/miniconda3/envs/cisTopic/lib/python3.7/site-packages/pyarrow/error.pxi in pyarrow.lib.check_status() ArrowInvalid: offset overflow while concatenating arrays # Assign combine chunks table to new tabled. In [91]: pa_table_combined = pa_table.combine_chunks() # Get first column In [92]: pa_col1_array_from_pa_table_combined = pa_table_combined.column(0) # Get number of chunks In [93]: pa_col1_array_from_pa_table_combined.num_chunks Out[93]: 3 # Try to combine column 1 again. In [94]: pa_col1_array_from_pa_table_combined.combine_chunks() --------------------------------------------------------------------------- ArrowInvalid Traceback (most recent call last) <ipython-input-94-e2e323e6519f> in <module> ----> 1 pa_col1_array_from_pa_table_combined.combine_chunks() /software/miniconda3/envs/cisTopic/lib/python3.7/site-packages/pyarrow/table.pxi in pyarrow.lib.ChunkedArray.combine_chunks() /software/miniconda3/envs/cisTopic/lib/python3.7/site-packages/pyarrow/array.pxi in pyarrow.lib.concat_arrays() /software/miniconda3/envs/cisTopic/lib/python3.7/site-packages/pyarrow/error.pxi in pyarrow.lib.pyarrow_internal_check_status() /software/miniconda3/envs/cisTopic/lib/python3.7/site-packages/pyarrow/error.pxi in pyarrow.lib.check_status() ArrowInvalid: offset overflow while concatenating arrays # Get sizes of each chunk. In [106]: [chunk.nbytes for chunk in pa_col1_array_from_pa_table_combined.chunks] Out[106]: [2341650593, 2342925682, 241257842]
Attachments
Issue Links
- is related to
-
ARROW-17828 [C++][Python] Large strings cause ArrowInvalid: offset overflow while concatenating arrays
- Open
- relates to
-
ARROW-7245 [C++] Allow automatic String -> LargeString promotions when concatenating tables
- Open