Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-18164

[Python] Dataset scanner does not follow default memory pool setting

    XMLWordPrintableJSON

Details

    Description

      Even if I set the system memory pool as default, it still uses the jemalloc one (running this on Ubuntu where jemalloc is the default if not set by the user):

      import pyarrow as pa
      import pyarrow.dataset as ds
      import pyarrow.parquet as pq
      pq.write_table(pa.table({'a': [1, 2, 3]}), "test.parquet")
      
      In [2]: pa.set_memory_pool(pa.system_memory_pool())
      
      In [3]: pa.total_allocated_bytes()
      Out[3]: 0
      
      In [4]: table = ds.dataset("test.parquet").to_table()
      
      In [5]: pa.total_allocated_bytes()
      Out[5]: 0
      
      In [6]: pa.set_memory_pool(pa.jemalloc_memory_pool())
      
      In [7]: pa.total_allocated_bytes()
      Out[7]: 128
      

      Attachments

        Issue Links

          Activity

            People

              jorisvandenbossche Joris Van den Bossche
              jorisvandenbossche Joris Van den Bossche
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 10m
                  1h 10m