Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-2683

[Python] Resource Warning (Unclosed File) when using pyarrow.parquet.read_table()

    Details

      Description

      pyarrow version from python repl:

      >>> import pyarrow
      >>> pyarrow.__version__
      '0.9.0.post1'

      python interpreter information:

      Python 3.6.5 (default, Mar 30 2018, 06:42:10)
      [GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.39.2)] on darwin

      arbitrary, potentially relevant system information:

      OS                       : macOS High Sierra (10.13.4)
      homebrew package         : python: stable 3.6.5 (bottled), devel 3.7.0b4, HEAD
      pip version              : pip 10.0.1
      pipenv version           : pipenv, version 2018.05.18
      pyarrow version (via pip): pyarrow         0.9.0.post1
      cython version (via pip) : Cython          0.28.2

       

      Issue Description:

      I see a ResourceWarning, which doesn't seem to be an error, but seems important enough (a.k.a. annoying enough) that I thought it would be worth asking about. Uwe L. Korn was nice enough to respond in  #general in the arrow slack.

      The main problem is as follows:

      1. with this code in a python unittest:
        def test_arrow_from_parquet(self):
        table = parquet.read_table(<path as str>)

        I see this warning:

        ResourceWarning: unclosed file <_io.BufferedReader name=<path_to_file>
      2. I tried adding the following, per Uwe's request:
        warnings.simplefilter("error")
      3. I then see this information:
        test_arrow_from_parquet (tests.datalayer_test.TestFileReader) ... Exception ignored in: <_io.FileIO name=<path_to_file> mode='rb' closefd=True>
        
        ResourceWarning: unclosed file <_io.BufferedReader name=<path_to_file>>
      4. Uwe's thoughts:
        That could be a valid error. We don’t seem to close the file we open in `ParquetFile.__init__`

         

        Attachments

        1. parquetread_test.py
          0.3 kB
          Aldrin
        2. simple.parquet
          1 kB
          Aldrin

          Issue Links

            Activity

              People

              • Assignee:
                kszucs Krisztian Szucs
                Reporter:
                octalene Aldrin
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m