Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-13763

[Python] Files opened for read with pyarrow.parquet are not explicitly closed

    XMLWordPrintableJSON

Details

    Description

      It appears that files opened for read using pyarrow.parquet.read_table (and therefore pyarrow.parquet.ParquetDataset) are not explicitly closed.  

      This seems to be the case for both use_legacy_dataset=True and False.  The files don't remain open at the os level (verified using lsof).  They do however seem to rely on the python gc to close.  

      My use case is that i'd like to use a custom fsspec filesystem that interfaces to an s3 like API. It handles the remote download of the parquet file and passes to pyarrow a handle of a temporary file downloaded locally.  It then is looking for an explicit close() or _exit_() to then clean up the temp file.  

      Attachments

        1. test.py
          1 kB
          Richard Kimoto

        Issue Links

          Activity

            People

              milesgranger Miles Granger
              kimotorc Richard Kimoto
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 3h 20m
                  3h 20m