Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-1410

Plasma object store occasionally pauses for a long time

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.7.0
    • C++ - Plasma
    • None
    • Ubuntu 16.04

    Description

      The problem can be reproduced as follows. First start a plasma store with

      plasma_store -s /tmp/s1 -m 500000000000
      

      Then continuously put in objects using a script like the following.

      import pyarrow.plasma as plasma
      import numpy as np
      
      client = plasma.connect('/tmp/s1', '', 0)
      
      for i in range(20000):
          print(i)
          object_id = plasma.ObjectID(np.random.bytes(20))
          client.create(object_id, np.random.randint(0, 100000000))
          client.seal(object_id)
      

      As the loop counters are being printed, you will see long pauses. The problem is the fact that we are mmapping pages with the MAP_POPULATE flag. Though this can be used to improve performance of subsequent object creations, it isn't worth the long pauses. We may want to find a way to populate the pages in the background.

      Attachments

        Activity

          People

            robertnishihara Robert Nishihara
            robertnishihara Robert Nishihara
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: