Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-17360

[Python] Order of columns in pyarrow.feather.read_table

    XMLWordPrintableJSON

Details

    Description

      xref https://github.com/pandas-dev/pandas/issues/47944

       

      In [1]: df = pd.DataFrame({"a": [1, 2, 3], "b": ["a", "b", "c"]})
      
      # pandas main branch / 1.5
      In [2]: df.to_orc("abc")
      
      In [3]: pd.read_orc("abc", columns=['b', 'a'])
      Out[3]:
         a  b
      0  1  a
      1  2  b
      2  3  c
      
      In [4]: import pyarrow.orc as orc
      
      In [5]: orc_file = orc.ORCFile("abc")
      
      # reordered to a, b
      In [6]: orc_file.read(columns=['b', 'a']).to_pandas()
      Out[6]:
         a  b
      0  1  a
      1  2  b
      2  3  c
      
      # reordered to a, b
      In [7]: orc_file.read(columns=['b', 'a'])
      Out[7]:
      pyarrow.Table
      a: int64
      b: string
      ----
      a: [[1,2,3]]
      b: [["a","b","c"]] 

      Attachments

        Activity

          People

            alenka Alenka Frim
            moeschke NULL
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 1h 40m
                1h 40m