CouchDB
  1. CouchDB
  2. COUCHDB-1171

Multiple requests to _changes feed causes {error, system_limit} "Too many processes"

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.0.2, 1.0.3, 1.1
    • Fix Version/s: 1.0.3, 1.1, 1.2
    • Component/s: None
    • Labels:
      None
    • Skill Level:
      New Contributors Level (Easy)

      Description

      Originally I have investigated of issue 182 of couchdb-python package where calling db.changes() function over 32768 times generates next messages in CouchDB log:

      [Thu, 19 May 2011 14:03:26 GMT] [info] [<0.2909.0>] 127.0.0.1 - - 'GET' /test/_changes 200
      [Thu, 19 May 2011 14:03:26 GMT] [error] [emulator] Too many processes
      [Thu, 19 May 2011 14:03:26 GMT] [error] [<0.2909.0>] Uncaught error in HTTP request:

      {error,system_limit}
      [Thu, 19 May 2011 14:03:26 GMT] [info] [<0.2909.0>] Stacktrace: [{erlang,spawn, [erlang,apply, [#Fun<couch_stats_collector.1.123391259>,[]]]},
      {erlang,spawn,1},
      {couch_httpd_db,handle_changes_req,2},
      {couch_httpd_db,do_db_req,2},
      {couch_httpd,handle_request_int,5},
      {mochiweb_http,headers,5},
      {proc_lib,init_p_do_apply,3}]
      [Thu, 19 May 2011 14:03:26 GMT] [info] [<0.2909.0>] 127.0.0.1 - - 'GET' /test/_changes 500

      More info about this issue could be found there: http://code.google.com/p/couchdb-python/issues/detail?id=182

      However, I still couldn't reproduce this error using only httplib module, but I've got that same behavior using feed=longpool option:

      from httplib import HTTPConnection
      def test2():
      conn = HTTPConnection('localhost:5984')
      conn.connect()
      i = 0
      while(True):
      conn.putrequest('GET', '/test/_changes?feed=longpool')
      conn.endheaders()
      conn.getresponse().read()
      i = i + 1
      if i % 100 == 0:
      print i

      When i get's around 32667 exception raises
      Traceback (most recent call last):
      File "/home/kxepal/projects/couchdb-python/issue-182/test.py", line 259, in <module>
      test2()
      File "/home/kxepal/projects/couchdb-python/issue-182/test.py", line 239, in test2
      resp.read()
      File "/usr/lib/python2.6/httplib.py", line 522, in read
      return self._read_chunked(amt)
      File "/usr/lib/python2.6/httplib.py", line 565, in _read_chunked
      raise IncompleteRead(''.join(value))
      httplib.IncompleteRead: IncompleteRead(0 bytes read)

      [Thu, 19 May 2011 14:10:20 GMT] [info] [<0.3240.4>] 127.0.0.1 - - 'GET' /test/_changes?feed=longpool 200
      [Thu, 19 May 2011 14:10:20 GMT] [error] [emulator] Too many processes
      [Thu, 19 May 2011 14:10:20 GMT] [error] [<0.3240.4>] Uncaught error in HTTP request: {error,system_limit}

      [Thu, 19 May 2011 14:10:20 GMT] [info] [<0.3240.4>] Stacktrace: [

      {erlang,spawn, [erlang,apply, [#Fun<couch_stats_collector.1.123391259>,[]]]}

      ,

      {erlang,spawn,1}

      ,

      {couch_httpd_db,handle_changes_req,2}

      ,

      {couch_httpd_db,do_db_req,2}

      ,

      {couch_httpd,handle_request_int,5}

      ,

      {mochiweb_http,headers,5}

      ,

      {proc_lib,init_p_do_apply,3}

      ]
      [Thu, 19 May 2011 14:10:20 GMT] [info] [<0.3240.4>] 127.0.0.1 - - 'GET' /test/_changes?feed=longpool 500

      Same error. I know, that test function is quite outside from real use case, but is this correct behavior and couldn't it be used in malicious aims?
      This exception occurres only for multiple requests within single connection for changes feed, chunked lists or attachments are not affected, if I've done all right.

      Test environment:
      Gentoo Linux 2.6.38
      CouchDB 1.0.2 release
      couchdb-python@63feefd9e3b6
      Python 2.6.6

      If there is needed some additional information I could try to provide it.

        Activity

        Hide
        Alexander Shorin added a comment -

        Same behavior for 1.0.3 and 1.1.0 tags

        Show
        Alexander Shorin added a comment - Same behavior for 1.0.3 and 1.1.0 tags
        Hide
        Robert Newson added a comment -

        Is this a couchdb bug? it seems you are simply hitting your systems limit?

        Show
        Robert Newson added a comment - Is this a couchdb bug? it seems you are simply hitting your systems limit?
        Hide
        Alexander Shorin added a comment -

        That's the question, because there is no such behavior for all other resources: chunked lists, attachments, shows, views etc. I know, that I could simply extend Erlang processes count, but this wouldn't be default configuration "from the box" and looks like workaround, but not a solution. I don't know Erlang specifics and may be wrong, but could this behavior be used for somekind DoS attacks? Just to crush not CouchDB itself, but related applications which wouldn't expect this error.

        Show
        Alexander Shorin added a comment - That's the question, because there is no such behavior for all other resources: chunked lists, attachments, shows, views etc. I know, that I could simply extend Erlang processes count, but this wouldn't be default configuration "from the box" and looks like workaround, but not a solution. I don't know Erlang specifics and may be wrong, but could this behavior be used for somekind DoS attacks? Just to crush not CouchDB itself, but related applications which wouldn't expect this error.
        Hide
        Robert Newson added a comment -

        I did not assign this to myself.

        Show
        Robert Newson added a comment - I did not assign this to myself.
        Hide
        Randall Leeds added a comment -

        This issue is not a bug with CouchDB. System limits and Erlang limits may need to be tweaked for high volume deployments.

        There is a wiki page on performance tuning with relevant information. The reason it only appears with _changes is that feed=longpoll or feed=continuous are the only reliable ways to make long-lasting HTTP connections.

        Raising the limits to ridiculously high values is totally acceptable and should not crater your box (thanks, Erlang). Protecting against DDoS attacks that look like legitimate traffic is incredibly difficult and, IMHO, outside the responsibility of CouchDB. If anyone is running mission critical systems they should probably look at some DDoS protection appliances or proxies.

        If there's a good argument the other way, please re-open the ticket.

        Show
        Randall Leeds added a comment - This issue is not a bug with CouchDB. System limits and Erlang limits may need to be tweaked for high volume deployments. There is a wiki page on performance tuning with relevant information. The reason it only appears with _changes is that feed=longpoll or feed=continuous are the only reliable ways to make long-lasting HTTP connections. Raising the limits to ridiculously high values is totally acceptable and should not crater your box (thanks, Erlang). Protecting against DDoS attacks that look like legitimate traffic is incredibly difficult and, IMHO, outside the responsibility of CouchDB. If anyone is running mission critical systems they should probably look at some DDoS protection appliances or proxies. If there's a good argument the other way, please re-open the ticket.
        Hide
        Matt Goodall added a comment -

        I think there may be a problem here actually. What's happening is that couch_httpd_db:handle_changes_req1/2 always calls couch_stats_collector:track_process_count/1 and that spawns a MonitorFun process to watch for the end of the _changes process. However, MonitorFun blocks on its receive until the TCP socket is closed by the client, even when a normal or longpoll request's chunks have been sent and the request is complete.

        That means that a client that reuses HTTP connections, and happens to GET /some_db/_changes, may eventually starve CouchDB of processes. As it happens, the next request to CouchDB also fails which basically restarts CouchDB and clears the problem .

        It's really easy to repeat: simply add a "+P 256" to the erl command line to give erl a tiny number of processes to play with and run the following Python code:

        import httplib
        import itertools

        conn = httplib.HTTPConnection('localhost:5984')
        for i in itertools.count():
        if not i % 100:
        print i, '...'
        conn.request('GET', '/scratch/_changes')
        resp = conn.getresponse()
        if resp.status != 200:
        break
        resp.read()

        print 'error after:', i

        I can only manage 139 requests here before CouchDB runs out of processes.

        The fix is probably to explicitly call couch_stats_collector's increment/1 and decrement/1 functions from the couch_httpd_db module instead of calling couch_stats_collector:track_process_count/1 and relying on the process to end.

        Show
        Matt Goodall added a comment - I think there may be a problem here actually. What's happening is that couch_httpd_db:handle_changes_req1/2 always calls couch_stats_collector:track_process_count/1 and that spawns a MonitorFun process to watch for the end of the _changes process. However, MonitorFun blocks on its receive until the TCP socket is closed by the client, even when a normal or longpoll request's chunks have been sent and the request is complete. That means that a client that reuses HTTP connections, and happens to GET /some_db/_changes, may eventually starve CouchDB of processes. As it happens, the next request to CouchDB also fails which basically restarts CouchDB and clears the problem . It's really easy to repeat: simply add a "+P 256" to the erl command line to give erl a tiny number of processes to play with and run the following Python code: import httplib import itertools conn = httplib.HTTPConnection('localhost:5984') for i in itertools.count(): if not i % 100: print i, '...' conn.request('GET', '/scratch/_changes') resp = conn.getresponse() if resp.status != 200: break resp.read() print 'error after:', i I can only manage 139 requests here before CouchDB runs out of processes. The fix is probably to explicitly call couch_stats_collector's increment/1 and decrement/1 functions from the couch_httpd_db module instead of calling couch_stats_collector:track_process_count/1 and relying on the process to end.
        Hide
        Matt Goodall added a comment -

        Attached is a possible fix and a JS test. I'm not sure how reliable the test is as it probably relies on the browser reusing the connection.

        Show
        Matt Goodall added a comment - Attached is a possible fix and a JS test. I'm not sure how reliable the test is as it probably relies on the browser reusing the connection.
        Hide
        Alexander Shorin added a comment -

        If some one have noticed, "longpool" in test function and logs was an actually typo, but not a mistake. I was told about it only few days ago(: So now there are two states of error:
        1. Requesting only _changes resource with default feed parameter makes connection keeped alive, but produces 500 HTTP error and continuously writes about system_limit in logs.
        2. Requesting _changes resource with incorrect feed parameter forces to close connection after system_limit have reached due to IncompleteRead exception.
        For correct longpoll feed parameter state 2 is actual.

        Show
        Alexander Shorin added a comment - If some one have noticed, "longpool" in test function and logs was an actually typo, but not a mistake. I was told about it only few days ago(: So now there are two states of error: 1. Requesting only _changes resource with default feed parameter makes connection keeped alive, but produces 500 HTTP error and continuously writes about system_limit in logs. 2. Requesting _changes resource with incorrect feed parameter forces to close connection after system_limit have reached due to IncompleteRead exception. For correct longpoll feed parameter state 2 is actual.
        Hide
        Adam Kocoloski added a comment -

        Nice find, Matt. The proposed patch looks good to me.

        Show
        Adam Kocoloski added a comment - Nice find, Matt. The proposed patch looks good to me.

          People

          • Assignee:
            Robert Newson
            Reporter:
            Alexander Shorin
          • Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development