[ARROW-8641] [Python] Regression in feather: no longer supports permutation in column selection - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.17.1, 1.0.0
Component/s: C++, Python
Labels:
- pull-request-available

External issue URL:
https://github.com/apache/arrow/issues/24802

Description

A quite annoying regression (original report from https://github.com/pandas-dev/pandas/issues/33878), is that when specifying columns to read, this now fails if the order of the columns is not exactly the same as in the file:

In [27]: table = pa.table([[1, 2, 3], [4, 5, 6], [7, 8, 9]], names=['a', 'b', 'c'])    

In [29]: from pyarrow import feather 

In [30]: feather.write_feather(table, "test.feather")   

# this works fine
In [32]: feather.read_table("test.feather", columns=['a', 'b'])                                                                                                                                                    
Out[32]: 
pyarrow.Table
a: int64
b: int64

In [33]: feather.read_table("test.feather", columns=['b', 'a'])                                                                                                                                                    
---------------------------------------------------------------------------
ArrowInvalid                              Traceback (most recent call last)
<ipython-input-33-e01caeabb389> in <module>
----> 1 feather.read_table("test.feather", columns=['b', 'a'])

~/scipy/repos/arrow/python/pyarrow/feather.py in read_table(source, columns, memory_map)
    237         return reader.read_indices(columns)
    238     elif all(map(lambda t: t == str, column_types)):
--> 239         return reader.read_names(columns)
    240 
    241     column_type_names = [t.__name__ for t in column_types]

~/scipy/repos/arrow/python/pyarrow/feather.pxi in pyarrow.lib.FeatherReader.read_names()

~/scipy/repos/arrow/python/pyarrow/error.pxi in pyarrow.lib.check_status()

ArrowInvalid: Schema at index 0 was different: 
b: int64
a: int64
vs
a: int64
b: int64

Attachments

Issue Links

links to

GitHub Pull Request #7122

Activity

People

Assignee:: Unassigned

Reporter:: Joris Van den Bossche

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 30/Apr/20 06:36

Updated:: 11/Jan/23 08:01

Resolved:: 07/May/20 16:24

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

0.5h