Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-5139

[Python/C++] Empty column selection no longer restores index

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 0.12.1
    • None
    • C++, Python

    Description

      The index of a dataframe is no longer reconstructed when using empty column selection. This is a regression to 0.12.1 and probably only happens for pd.RangeIndex

      import pandas as pd
      import pyarrow as pa
      import pyarrow.parquet as pq
      from kartothek.serialization import ParquetSerializer
      from storefact import get_store_from_url
      print(pa.__version__)
      df = pd.DataFrame(
          {"a": [1, 2]}
      )
      print(df.index)
      
      table = pa.Table.from_pandas(df)
      buf = pa.BufferOutputStream()
      pq.write_table(table, buf)
      reader = pa.BufferReader(buf.getvalue().to_pybytes())
      table_restored = pq.read_pandas(reader, columns=[])
      df_restored = table_restored.to_pandas()
      print(len(df_restored))
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              fjetter Florian Jetter
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: