Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-362

Python: Calling to_pandas on a table read from Parquet leaks memory

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.1.0
    • 0.2.0
    • Python
    • None

    Description

      Steps to reproduce:

      • Read a parquet file with pyarrow.parquet.read_table and convert the table to a DataFrame with to_pandas
      • Repeat this several times and see an ever increasing memory usage

      This seems to happen only in this combination. Calling gc.collect doesn't help.

      Attachments

        Activity

          People

            wesm Wes McKinney
            uwe Uwe Korn
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: