Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-6976

Possible memory leak in pyarrow read_parquet

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Duplicate
    • Affects Version/s: 0.15.0
    • Fix Version/s: None
    • Component/s: Python
    • Labels:
      None
    • Environment:
      linux ubuntu 18.04

      Description

       

      Version and repro info in the gist below.

      Not sure if I'm not understanding something from this https://arrow.apache.org/blog/2019/02/05/python-string-memory-0.12/

      but there seems to be memory accumulation when that is exacerbated with higher arity objects like strings and dates (not datetimes).

       

      I was not able to reproduce the issue on MacOS. Downgrading to 0.14.1 seemed to "fix" or lessen the problem.

       

      https://gist.github.com/cottrell/a3f95aa59408d87f925ec606d8783e62

       

      Let me know if this post should go elsewhere.

       

       
      

       

       

        Attachments

        1. image-2019-10-23-16-17-20-739.png
          98 kB
          david cottrell
        2. pyarrow_0150.png
          49 kB
          Joris Van den Bossche
        3. pyarrow-master.png
          36 kB
          Joris Van den Bossche

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                cottrell david cottrell
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: