Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-1311

python hangs after write a few parquet tables

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 0.5.0
    • Fix Version/s: 0.6.0
    • Component/s: Python
    • Labels:
      None
    • Environment:
      Python 3.5.2, pyarrow 0.5.0

      Description

      I had a program to read some csv files (a few million rows each, 9 columns), and converted with:

      import os
      import pandas as pd
      
      import pyarrow.parquet as pq
      import pyarrow
      
      def to_parquet(output_file, csv_file):
          df = pd.read_csv(csv_file)
          df['gecco_variant'] = [ v.lstrip('0') for v in df['gecco_variant']]
          table = pyarrow.Table.from_pandas(df)
          pq.write_table(table, output_file)
      
      

      The first csv file would always complete, but python would hang on the second or third file, and sometimes on a much later file.

        Attachments

        1. thread-apply-all-bt-full.txt
          516 kB
          Keith Curtis
        2. backtrace.txt
          31 kB
          Keith Curtis

          Activity

            People

            • Assignee:
              wesmckinn Wes McKinney
              Reporter:
              K94 Keith Curtis
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: