CouchDB
  1. CouchDB
  2. COUCHDB-1129

file descriptors sometimes not closed after compaction

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.0.2
    • Fix Version/s: 1.1.1, 1.2
    • Component/s: Database Core
    • Labels:
      None
    • Skill Level:
      Regular Contributors Level (Easy to Medium)

      Description

      It seems there are still cases where file descriptors are not released upon compaction finishing.
      When I asked on IRC rnewson confirmed he'd seen the behavior also and the last comment on 926 also suggests there are still times where this occurs.

      Someone needs to take a careful eye to any race conditions we might have between opening the file and subscribing to the compaction notification.

        Issue Links

          Activity

          Hide
          Dan Everton added a comment -

          We're still seeing this issue with CouchDB 1.1.0. The database isn't under particularly heavy write load but every single compaction leaks file handles. This is on R14B with the default CouchDB command line settings.

          beam.smp 5283 couchdb 13u REG 253,9 80646251 770050 /apps/couchdb/.inventory_design/cee4a2835bdfb63165f1a2271260fef4.view
          beam.smp 5283 couchdb 15u REG 253,9 4025593965 49158 /apps/couchdb/inventory.couch
          beam.smp 5283 couchdb 16u REG 253,9 7186174063 49157 /apps/couchdb/.delete/e3ed87587b162839186b76c8c073cc2f (deleted)
          beam.smp 5283 couchdb 17u REG 253,9 4025593965 49158 /apps/couchdb/inventory.couch
          beam.smp 5283 couchdb 18u REG 253,9 77913 49154 /apps/couchdb/_replicator.couch
          beam.smp 5283 couchdb 23u REG 253,9 7124119663 49155 /apps/couchdb/.delete/307d7d9ee4cb3593eb72d374e622fca4 (deleted)
          beam.smp 5283 couchdb 58u REG 253,9 4185 49153 /apps/couchdb/_users.couch

          Show
          Dan Everton added a comment - We're still seeing this issue with CouchDB 1.1.0. The database isn't under particularly heavy write load but every single compaction leaks file handles. This is on R14B with the default CouchDB command line settings. beam.smp 5283 couchdb 13u REG 253,9 80646251 770050 /apps/couchdb/.inventory_design/cee4a2835bdfb63165f1a2271260fef4.view beam.smp 5283 couchdb 15u REG 253,9 4025593965 49158 /apps/couchdb/inventory.couch beam.smp 5283 couchdb 16u REG 253,9 7186174063 49157 /apps/couchdb/.delete/e3ed87587b162839186b76c8c073cc2f (deleted) beam.smp 5283 couchdb 17u REG 253,9 4025593965 49158 /apps/couchdb/inventory.couch beam.smp 5283 couchdb 18u REG 253,9 77913 49154 /apps/couchdb/_replicator.couch beam.smp 5283 couchdb 23u REG 253,9 7124119663 49155 /apps/couchdb/.delete/307d7d9ee4cb3593eb72d374e622fca4 (deleted) beam.smp 5283 couchdb 58u REG 253,9 4185 49153 /apps/couchdb/_users.couch
          Hide
          Paul Joseph Davis added a comment -

          Are there any view updates running?

          Show
          Paul Joseph Davis added a comment - Are there any view updates running?
          Hide
          Dan Everton added a comment -

          Yes there are reads from a view at the same time as the compaction.

          Show
          Dan Everton added a comment - Yes there are reads from a view at the same time as the compaction.
          Hide
          Jan Lehnardt added a comment -

          The fix version for this is 1.2, can you try this with the 1.2.x or master branches?

          Show
          Jan Lehnardt added a comment - The fix version for this is 1.2, can you try this with the 1.2.x or master branches?
          Hide
          Filipe Manana added a comment -

          Can you test with the 1.1.x branch (upcoming 1.1.1), or anything else more bleeding edge like 1.2.x or trunk?

          Show
          Filipe Manana added a comment - Can you test with the 1.1.x branch (upcoming 1.1.1), or anything else more bleeding edge like 1.2.x or trunk?
          Hide
          Paul Joseph Davis added a comment -

          @Dan Your compaction files won't go away until anything using them has also gone away. So I'd first make sure you don't have a long view read, view updater, or other long lived connection holding open the database.

          Show
          Paul Joseph Davis added a comment - @Dan Your compaction files won't go away until anything using them has also gone away. So I'd first make sure you don't have a long view read, view updater, or other long lived connection holding open the database.
          Hide
          Dan Everton added a comment -

          The database is mostly write only with occasional hits to a view. During compaction the view will be requested a few times but always closed immediately after. So there should be nothing holding the view open.

          I'm trying to get a 1.1.1 build going but we're stuck on RHEL5 and it seems CouchDB no longer builds on that. Possibly something to do with the Spidermonkey 1.8.5 changes but I'm still investigating.

          Show
          Dan Everton added a comment - The database is mostly write only with occasional hits to a view. During compaction the view will be requested a few times but always closed immediately after. So there should be nothing holding the view open. I'm trying to get a 1.1.1 build going but we're stuck on RHEL5 and it seems CouchDB no longer builds on that. Possibly something to do with the Spidermonkey 1.8.5 changes but I'm still investigating.
          Hide
          Filipe Manana added a comment -

          Dan, can it be that, without restarting the server, you updated the design document (potentially several times) and then issued a /db/_view_cleanup request? (COUCHDB-1309)

          Show
          Filipe Manana added a comment - Dan, can it be that, without restarting the server, you updated the design document (potentially several times) and then issued a /db/_view_cleanup request? ( COUCHDB-1309 )
          Hide
          Dan Everton added a comment -

          No, I don't think it's that bug. The CouchDB instance has been restarted several times to free up the file descriptors without changing the design documents.

          Show
          Dan Everton added a comment - No, I don't think it's that bug. The CouchDB instance has been restarted several times to free up the file descriptors without changing the design documents.
          Hide
          Dan Everton added a comment -

          Our initial testing indicates that this is indeed fixed in CouchDB 1.1.1. We've been running for a little while under heavy write and view load with compactions and not seen any file handle leaks.

          Show
          Dan Everton added a comment - Our initial testing indicates that this is indeed fixed in CouchDB 1.1.1. We've been running for a little while under heavy write and view load with compactions and not seen any file handle leaks.
          Hide
          Filipe Manana added a comment -

          Good to now.
          If it happens again, please reopen this ticket.
          Thanks for testing it

          Show
          Filipe Manana added a comment - Good to now. If it happens again, please reopen this ticket. Thanks for testing it
          Hide
          Paul Hirst added a comment -

          This has just happened to me on 1.1.1 for the very first time. A compaction process left the old database file handle open (the file had been moved to .delete and deleted according to lsof). The compaction had finished 11 hours prior to me noticing I hadn't got the disk space back so a fair amount of time had passed. There weren't any long running view updates occuring. The database is under fairly continuous write load from processes using keep-alive HTTP connections. Would this be the problem?

          This is the first time I've seen this in probably 20 or 30 compactions on 1.1.1 so I doubt I can replicate it.

          I called _restart, the couch service stopped responding to requests for a few seconds (probably while the file was removed, certainly longer than normal restarts) and then everything was fine. There are no errors in the log. Infact I have logging set to 'error' and there is nothing in the log at all.

          Show
          Paul Hirst added a comment - This has just happened to me on 1.1.1 for the very first time. A compaction process left the old database file handle open (the file had been moved to .delete and deleted according to lsof). The compaction had finished 11 hours prior to me noticing I hadn't got the disk space back so a fair amount of time had passed. There weren't any long running view updates occuring. The database is under fairly continuous write load from processes using keep-alive HTTP connections. Would this be the problem? This is the first time I've seen this in probably 20 or 30 compactions on 1.1.1 so I doubt I can replicate it. I called _restart, the couch service stopped responding to requests for a few seconds (probably while the file was removed, certainly longer than normal restarts) and then everything was fine. There are no errors in the log. Infact I have logging set to 'error' and there is nothing in the log at all.
          Hide
          Paul Hirst added a comment -

          dch on IRC advised me that this has probably been fixed in 1.2 so I'm going to wait and see.

          Show
          Paul Hirst added a comment - dch on IRC advised me that this has probably been fixed in 1.2 so I'm going to wait and see.

            People

            • Assignee:
              Unassigned
              Reporter:
              Randall Leeds
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development