Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-6060

[Python] too large memory cost using pyarrow.parquet.read_table with use_threads=True

    XMLWordPrintableJSON

Details

    Description

       I tried to load a parquet file of about 1.8Gb using the following code. It crashed due to out of memory issue.

      import pyarrow.parquet as pq
      pq.read_table('/tmp/test.parquet')

       However, it worked well with use_threads=True as follows

      pq.read_table('/tmp/test.parquet', use_threads=False)

      If pyarrow is downgraded to 0.12.1, there is no such problem.

      Attachments

        Issue Links

          Activity

            People

              bkietz Ben Kietzman
              kwunlyou Kun Liu
              Votes:
              1 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 3h 20m
                  3h 20m