Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-2879

[Python] Arrow plasma can only use a small part of specified shared memory

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Won't Fix
    • None
    • None
    • Python
    • None

    Description

      Hi, thanks for the great job of arrow, it helps us a lot.

      However, we encounter a problem when we were using plasma.

      The sample code:

      import numpy as np
      import pyarrow as pa
      import pyarrow.plasma as plasma
      
      client = plasma.connect("/tmp/plasma", "", 0)
      
      puts = []
      nbytes = 0
      while True:
          a = np.ones((1000, 1000))
          try:
              oid = client.put(a)
              puts.append(client.get(oid))
              nbytes += a.nbytes
          except pa.lib.PlasmaStoreFull:
              print('use nbytes', nbytes)
              break
      

      We start a plasma store with 1G memory, but the nbytes output above is only 496000000, which cannot even reach half of the memory we specified.

      I cannot figure out why plasma can only use such a small part of shared memory. Could anybody help me? Thanks a lot.

      Attachments

        Activity

          People

            Unassigned Unassigned
            chineking chineking
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: