Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-723

Arrow freezes on write if chunk_size=0

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.2.0
    • 0.3.0
    • Python
    • None
    • Linux, macOS

    Description

      Pyarrow freezes if you set chunk_size=0 (e.g. if you forget to account for short data when setting chunk size as a function of table length, see example).

      Would expect either to handle gracefully (e.g. revert to behaviour chunk_size=None) or to throw error.

      ```
      import numpy as np
      import pandas as pd
      import pyarrow as pa
      import pyarrow.parquet as pq
      cols = 'A', 'B', 'C', 'D'
      row = np.arange(4)
      data = pd.DataFrame([row], columns=cols)
      table = pa.Table.from_pandas(data.reset_index(), timestamps_to_ms=True)
      pq.write_table(table, 'test.pq', chunk_size=int(len(data) / 4))
      ```

      Attachments

        Issue Links

          Activity

            People

              wesm Wes McKinney
              mangecoeur Jonathan Chambers
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: