Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-5993

[Python] Reading a dictionary column from Parquet results in disproportionate memory usage

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • 0.14.0
    • 0.15.0
    • Python

    Description

      I'm using pyarrow to read a 40MB parquet file.

      When reading all of the columns besides the "body" columns, the process peaks at 170MB.

      Reading only the "body" column results in over 6GB of memory used.

      I made the file publicly accessible: s3://dhavivresearch/pyarrow/demofile.parquet

       

       

      Attachments

        Issue Links

          Activity

            People

              wesm Wes McKinney
              danielil Daniel Haviv
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Slack

                  Issue deployment