Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-375

columns parameter in parquet.read_table() raises KeyError for valid column

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.2.0
    • Python
    • None

    Description

      Using arrow commit 4fa7ac4 and parquet-cpp commit 0024665, I have

      In [1]: from pyarrow import parquet
      
      In [2]: t = parquet.read_table('/Users/christophercaycock/Desktop/sample.parquet')
      
      In [3]: t.to_pandas()
      Out[3]: 
         age name
      0    1    A
      1    2    B
      2    3    C
      
      In [4]: t = parquet.read_table('/Users/christophercaycock/Desktop/sample.parquet', columns=['age'])
      ---------------------------------------------------------------------------
      KeyError                                  Traceback (most recent call last)
      <ipython-input-4-5cf213819489> in <module>()
      ----> 1 t = parquet.read_table('/Users/christophercaycock/Desktop/sample.parquet', columns=['age'])
      
      /Users/christophercaycock/Desktop/arrow/python/pyarrow/parquet.pyx in pyarrow.parquet.read_table (/Users/christophercaycock/Desktop/arrow/python/build/temp.macosx-10.6-x86_64-3.5/parquet.cxx:2693)()
          143         return reader.read_all()
          144     else:
      --> 145         column_idxs = [reader.column_name_idx(column) for column in columns]
          146         arrays = [reader.read_column(column_idx) for column_idx in column_idxs]
          147         return Table.from_arrays(columns, arrays)
      
      /Users/christophercaycock/Desktop/arrow/python/pyarrow/parquet.pyx in pyarrow.parquet.ParquetReader.column_name_idx (/Users/christophercaycock/Desktop/arrow/python/build/temp.macosx-10.6-x86_64-3.5/parquet.cxx:2232)()
          102                 self.column_idx_map[str(metadata.schema().Column(i).path().get().ToDotString())] = i
          103 
      --> 104         return self.column_idx_map[column_name]
          105 
          106     def read_column(self, int column_index):
      
      KeyError: 'age'
      

      This happens on both Mac and Linux.

      Attachments

        Activity

          People

            wesm Wes McKinney
            chrisaycock Christopher Aycock
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: