Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-1276

Cannot serializer empty DataFrame to parquet

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 0.5.0
    • 0.6.0
    • Python
    • None

    Description

      The following code fails with pyarrow.lib.ArrowInvalid: Invalid: chunk size per row_group must be greater than 0 but should not:

      import pandas as pd
      import pyarrow as pa
      import pyarrow.parquet as pq
      
      df = pd.DataFrame({'x': pd.Series([], dtype=int)})
      table = pa.Table.from_pandas(df)
      buf = pa.InMemoryOutputStream()
      pq.write_table(table, buf)
      

      I have a test and a fix prepared and will upstream both in the upcoming days.

      Attachments

        Activity

          People

            marco.neumann.by Marco Neumann
            marco.neumann.by Marco Neumann
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: