Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-2592

[Python] Error reading old Parquet file due to metadata backwards compatibility issue

    XMLWordPrintableJSON

Details

    Description

      Pyarrow 0.8 and 0.9 raises an AssertionError for one of the datasets I have (created using an older version of pyarrow). Repro steps:

      In [1]: from pyarrow.parquet import ParquetDataset

      In [2]: d = ParquetDataset(['bug.parq'])

      In [3]: t = d.read()

      In [4]: t.to_pandas()
      ---------------------------------------------------------------------------
      AssertionError                            Traceback (most recent call last)
      <ipython-input-4-d17c9e2818f1> in <module>()
      ----> 1 t.to_pandas()

      table.pxi in pyarrow.lib.Table.to_pandas()

      ~/envs/cli3/lib/python3.6/site-packages/pyarrow/pandas_compat.py in table_to_blockmanager(options, table, memory_pool, nthreads, categories)
          529     # There must be the same number of field names and physical names
          530     # (fields in the arrow Table)
      --> 531     assert len(logical_index_names) == len(index_columns_set)
      {{    532 }}
          533     # It can never be the case in a released version of pyarrow that

      {{AssertionError: }}

       

      Here's the file: https://www.dropbox.com/s/oja3khjsc5tycfh/bug.parq

      (I was not able to attach it here due to a "missing token", whatever that means.)

      Attachments

        Issue Links

          Activity

            People

              wesm Wes McKinney
              dimaryaz Dima Ryazanov
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 0.5h
                  0.5h