Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-8641

[Python] Regression in feather: no longer supports permutation in column selection

    XMLWordPrintableJSON

Details

    Description

      A quite annoying regression (original report from https://github.com/pandas-dev/pandas/issues/33878), is that when specifying columns to read, this now fails if the order of the columns is not exactly the same as in the file:

      In [27]: table = pa.table([[1, 2, 3], [4, 5, 6], [7, 8, 9]], names=['a', 'b', 'c'])    
      
      In [29]: from pyarrow import feather 
      
      In [30]: feather.write_feather(table, "test.feather")   
      
      # this works fine
      In [32]: feather.read_table("test.feather", columns=['a', 'b'])                                                                                                                                                    
      Out[32]: 
      pyarrow.Table
      a: int64
      b: int64
      
      In [33]: feather.read_table("test.feather", columns=['b', 'a'])                                                                                                                                                    
      ---------------------------------------------------------------------------
      ArrowInvalid                              Traceback (most recent call last)
      <ipython-input-33-e01caeabb389> in <module>
      ----> 1 feather.read_table("test.feather", columns=['b', 'a'])
      
      ~/scipy/repos/arrow/python/pyarrow/feather.py in read_table(source, columns, memory_map)
          237         return reader.read_indices(columns)
          238     elif all(map(lambda t: t == str, column_types)):
      --> 239         return reader.read_names(columns)
          240 
          241     column_type_names = [t.__name__ for t in column_types]
      
      ~/scipy/repos/arrow/python/pyarrow/feather.pxi in pyarrow.lib.FeatherReader.read_names()
      
      ~/scipy/repos/arrow/python/pyarrow/error.pxi in pyarrow.lib.check_status()
      
      ArrowInvalid: Schema at index 0 was different: 
      b: int64
      a: int64
      vs
      a: int64
      b: int64
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              jorisvandenbossche Joris Van den Bossche
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 0.5h
                  0.5h