Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-6747

Enable server side limit default on the size of results returned (to prevent OOMs)

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Implemented
    • None
    • None
    • None
    • None

    Description

      We have seen a couple of situations where clients fetching a large row can cause the whole server to go down, due to large GC pauses/Out of memory error.

      This should be easily avoidable, if the client can use a Scan instead of a Get, and/or use batching to reduce the size. But, it seems difficult to enforce this. Moreover,
      once in a while, there may be genuine outliers/bad clients, that cause such large requests.

      We need to handle such situations gracefully, and not have the RS reboot for things that can be prevented. The proposal here is to enforce a maximum response size
      at the Server end, so we are not at the mercy of the client's good behavior to let the server running.

      We already log large responses. But, if the response is too large, it just kills the server. We don't have it logged, and the only way to find out is to go through the heap dump.
      More importantly, our availability/reliability numbers will go down because the whole region/regionserver fails instead of just the single bad request.

      I think it will be useful for the server to maintain a maximum request size that it will serve. Something large like 2-3G, so normal operations
      do not need to be bothered. If a single get/scan operation exceeds the size, we will just throw an exception for the request. This will
      a) avoid the RS from going on and on until it hits out of memory, and
      b) will give a cleaner way for the clients, and for us to see what is the problem.

      Attachments

        Activity

          People

            Unassigned Unassigned
            amitanand Amitanand Aiyer
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: