Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-16697

[FlightRPC][Python] Server seems to leak memory during DoPut

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Not A Bug
    • None
    • None
    • None
    • None

    Description

      Hello,

      We are stress testing our Flight RPC server (PyArrow 8.0.0) with write-heavy workloads and are running into what appear to be memory leaks.

      The server is under pressure by a number of separate clients doing DoPut. What we are seeing is that server's memory usage only ever goes up until the server finally gets whacked by k8s due to hitting memory limit.

      I have spent many hours fishing through our code for memory leaks with no success. Even short-circuiting all our custom DoPut handling logic does not alleviate the situation. This led me to create a reproducer that uses nothing but PyArrow and I see the server process memory only increasing similar to what we see on our servers.

      The reproducer is in attachments + I included the test CSV file (20MB) that I use for my tests. Few notes:

      • The client code has multiple threads, each emulating a separate Flight Client
      • There are two variants where I see slightly different memory usage characteristic:
        • _do_put_with_client_reuse << one client opened at start of thread, then hammering many puts, finally closing the client; leaks appear to happen faster in this variant
        • _do_put_with_client_per_request << client opens & connects, does put, then disconnects; loop like this many times; leaks appear to happen slower in this variant if there are less concurrent clients; increasing number of threads 'helps'
      • The server code handling do_put reads batch-by-batch & does nothing with the chunks

      Also one interesting (but highly likely unrelated thing) that I keep noticing is that sometimes FlightClient takes long time to close (like 5seconds). It happens intermittently.

      Attachments

        1. sample.csv.gz
          5.72 MB
          Lubo Slivka
        2. massif.txt
          144 kB
          Lubo Slivka
        3. massif_client.txt
          149 kB
          Lubo Slivka
        4. leak_repro_server.py
          0.6 kB
          Lubo Slivka
        5. leak_repro_client.py
          1 kB
          Lubo Slivka

        Activity

          People

            lidavidm David Li
            lupko Lubo Slivka
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: