Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-1311

python hangs after write a few parquet tables

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • 0.5.0
    • 0.6.0
    • Python
    • None
    • Python 3.5.2, pyarrow 0.5.0

    Description

      I had a program to read some csv files (a few million rows each, 9 columns), and converted with:

      import os
      import pandas as pd
      
      import pyarrow.parquet as pq
      import pyarrow
      
      def to_parquet(output_file, csv_file):
          df = pd.read_csv(csv_file)
          df['gecco_variant'] = [ v.lstrip('0') for v in df['gecco_variant']]
          table = pyarrow.Table.from_pandas(df)
          pq.write_table(table, output_file)
      
      

      The first csv file would always complete, but python would hang on the second or third file, and sometimes on a much later file.

      Attachments

        1. backtrace.txt
          31 kB
          Keith Curtis
        2. thread-apply-all-bt-full.txt
          516 kB
          Keith Curtis

        Activity

          People

            wesm Wes McKinney
            K94 Keith Curtis
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: