Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Duplicate
-
0.5.0
-
None
-
Python 3.5.2, pyarrow 0.5.0
Description
I had a program to read some csv files (a few million rows each, 9 columns), and converted with:
import os import pandas as pd import pyarrow.parquet as pq import pyarrow def to_parquet(output_file, csv_file): df = pd.read_csv(csv_file) df['gecco_variant'] = [ v.lstrip('0') for v in df['gecco_variant']] table = pyarrow.Table.from_pandas(df) pq.write_table(table, output_file)
The first csv file would always complete, but python would hang on the second or third file, and sometimes on a much later file.