Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-6976

Possible memory leak in pyarrow read_parquet

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Duplicate
    • 0.15.0
    • None
    • Python
    • None
    • linux ubuntu 18.04

    Description

       

      Version and repro info in the gist below.

      Not sure if I'm not understanding something from this https://arrow.apache.org/blog/2019/02/05/python-string-memory-0.12/

      but there seems to be memory accumulation when that is exacerbated with higher arity objects like strings and dates (not datetimes).

       

      I was not able to reproduce the issue on MacOS. Downgrading to 0.14.1 seemed to "fix" or lessen the problem.

       

      https://gist.github.com/cottrell/a3f95aa59408d87f925ec606d8783e62

       

      Let me know if this post should go elsewhere.

       

       
      

       

       

      Attachments

        1. pyarrow-master.png
          36 kB
          Joris Van den Bossche
        2. pyarrow_0150.png
          49 kB
          Joris Van den Bossche
        3. image-2019-10-23-16-17-20-739.png
          98 kB
          david cottrell

        Issue Links

          Activity

            People

              Unassigned Unassigned
              cottrell david cottrell
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: