Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-1410

Plasma object store occasionally pauses for a long time

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.7.0
    • Component/s: C++ - Plasma
    • Labels:
      None
    • Environment:
      Ubuntu 16.04

      Description

      The problem can be reproduced as follows. First start a plasma store with

      plasma_store -s /tmp/s1 -m 500000000000
      

      Then continuously put in objects using a script like the following.

      import pyarrow.plasma as plasma
      import numpy as np
      
      client = plasma.connect('/tmp/s1', '', 0)
      
      for i in range(20000):
          print(i)
          object_id = plasma.ObjectID(np.random.bytes(20))
          client.create(object_id, np.random.randint(0, 100000000))
          client.seal(object_id)
      

      As the loop counters are being printed, you will see long pauses. The problem is the fact that we are mmapping pages with the MAP_POPULATE flag. Though this can be used to improve performance of subsequent object creations, it isn't worth the long pauses. We may want to find a way to populate the pages in the background.

        Attachments

          Activity

            People

            • Assignee:
              robertnishihara Robert Nishihara
              Reporter:
              robertnishihara Robert Nishihara
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: