Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-2633

[Python] Parquet file not accesible to write after first read using PyArrow

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Cannot Reproduce
    • None
    • None
    • Python

    Description

       
      I am trying to read a parquet file in pandas dataframe, do some manipulation and write it back in the same file, however it seems file is not accessible to write after the first read in same function.

      It only works, if I don't perform STEP 1 below. Is there anyway to unlock the file as such?

      #STEP 1: Read entire parquet file
      pq_file = pq.ParquetFile('\dev\abc.parquet')
      exp_df = pq_file.read(nthreads=1, use_pandas_metadata=True).to_pandas()
      #STEP 2: Change some data in dataframe
      #
      #STEP 3: write merged dataframe
      pyarrow_table = pa.Table.from_pandas(exp_df)
      pq.write_table(pyarrow_table, '\dev\abc.parquet',compression='none',)
      

      Error:

      File "C:\Python36\lib\site-packages\pyarrow\parquet.py", line 943, in write_table
       **kwargs)
      File "C:\Python36\lib\site-packages\pyarrow\parquet.py", line 286, in __init__
       **options)
      File "_parquet.pyx", line 832, in pyarrow._parquet.ParquetWriter.__cinit__
      File "error.pxi", line 79, in pyarrow.lib.check_status
      pyarrow.lib.ArrowIOError: Failed to open local file: \dev\abc.parquet , error: Invalid argument
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            suman9730 Suman
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: