Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-3843

[Python] Writing Parquet file from empty table created with Table.from_pandas(..., preserve_index=False) fails

    XMLWordPrintableJSON

Details

    Description

      import pandas as pd
      import pyarrow.parquet as pq
      import pyarrow as pa
      
      
      def test_write_empty_preserve_index():
      
      # passes
      
      df = pd.DataFrame()
      table = pa.Table.from_pandas(df, preserve_index=True)
      pq.write_table(table, 'test1.parquet')
      table2 = pq.read_table('test1.parquet')
      df2 = table2.to_pandas()
      pd.util.testing.assert_frame_equal(df, df2)
      
      
      def test_write_empty_no_preserve_index():
      df = pd.DataFrame()
      table = pa.Table.from_pandas(df, preserve_index=False)
      
      # fails here
      pq.write_table(table, 'test2.parquet')
      
      table2 = pq.read_table('test2.parquet')
      df2 = table2.to_pandas()
      pd.util.testing.assert_frame_equal(df, df2)

       

      First test passes.  Second one fails with this:

       

      ___________________________________ test_write_empty_no_preserve_index ___________________________________
      
      def test_write_empty_no_preserve_index():
      df = pd.DataFrame()
      table = pa.Table.from_pandas(df, preserve_index=False)
      
      # fails here
      > pq.write_table(table, 'test2.parquet')
      
      test_empty.py:24: 
      _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
      ../.conda/envs/pedlenv/lib/python3.6/site-packages/pyarrow/parquet.py:1125: in write_table
      writer.write_table(table, row_group_size=row_group_size)
      ../.conda/envs/pedlenv/lib/python3.6/site-packages/pyarrow/parquet.py:361: in __exit__
      self.close()
      ../.conda/envs/pedlenv/lib/python3.6/site-packages/pyarrow/parquet.py:380: in close
      self.writer.close()
      pyarrow/_parquet.pyx:916: in pyarrow._parquet.ParquetWriter.close
      ???
      _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
      
      > ???
      E pyarrow.lib.ArrowIOError: Root node did not have children
      
      pyarrow/error.pxi:83: ArrowIOError
      

       

      I haven't had a chance to investigate but seems not desired behavior.

       

       

      Attachments

        Issue Links

          Activity

            People

              wesm Wes McKinney
              jlou2u Justin Lewis
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 10m
                  1h 10m