Issue Details (XML | Word | Printable)

Key: HADOOP-4802
Type: Bug Bug
Status: Open Open
Priority: Major Major
Assignee: Unassigned
Reporter: stack
Votes: 0
Watchers: 6
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

RPC Server send buffer retains size of largest response ever sent

Created: 07/Dec/08 07:45 PM   Updated: 13/Jan/09 07:47 PM
Return to search
Component/s: ipc
Affects Version/s: 0.18.2, 0.19.0
Fix Version/s: None

Time Tracking:
Not Specified

Issue Links:
Reference
 


 Description  « Hide
The stack-based ByteArrayOutputStream in Server.Hander is reset each time through the run loop. This will set the BAOS 'size' back to zero but the allocated backing buffer is unaltered. If during an Handlers' lifecycle, any particular RPC response was fat – Megabytes, even – the buffer expands during the write to accommodate the particular response but then never shrinks subsequently. If a hosting Server has had more than one 'fat payload' occurrence, the resultant occupied heap can provoke memory woes (See https://issues.apache.org/jira/browse/HBASE-900?focusedCommentId=12654009#action_12654009 for an extreme example; occasional payloads of 20-50MB with 30 handlers robbed the heap of 700MB).

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
stack added a comment - 07/Dec/08 08:24 PM
Suggested fix: reset BAOS if larger than N * initial buffer size.

Doug Cutting added a comment - 08/Dec/08 06:57 PM
I'd rather not see new configuration options for this. We should rather try hard to pick values that work well in all cases, if possible.

I wonder if the simpler change of not re-using the buffer might work just as well. If we simply allocate a new buffer per request whose initial size is 10k. I would be surprised if that allocation would be a significant RPC bottleneck, but we should of course test that with a benchmark.


Raghu Angadi added a comment - 08/Dec/08 07:15 PM
I was thinking on the same lines. Allocation per request would not hurt (mostly because various other extra costs). Coule of improvements by subclassing BAOS:
  1. Keep a fixed buffer 10kb. inside reset() replace the previous buffer with this. This avoids allocation for most RPCs.
  2. extra benefit : this would allow us to avoid a copy if the response could be written to socket in-line (common case). Right now server always copies (through asByteArray()).

Note that this jira will increase the buffer copies for large (multi-MB) responses because of copy each time buffer expands. (Partly negated by copies reduced by HADOOP-4797, since it takes multiple write() invocations to write all the data. in jdk, each write copies all the data!). A future improvement could be to write our own stream that uses a list of buffers.


Raghu Angadi added a comment - 08/Dec/08 09:53 PM
> extra benefit : this would allow us to avoid a copy if the response could be written to socket in-line (common case).

Once we have this (here or in a future jira), the fixed buffer could be a direct buffer and that will remove yet another copy.


stack added a comment - 09/Dec/08 08:27 PM
Thanks for the feedback lads.

I like your suggestions Raghu. This patch removes configuration and makes an attempt at #1. #2 and the third suggestion of direct buffer to socket would take more work (though we should do them, yeah).


stack added a comment - 09/Dec/08 08:32 PM
Patch applies to 0.19 and to TRUNK.

Doug Cutting added a comment - 09/Dec/08 09:10 PM
I'm still not convinced we should do more than replace the buf.reset() with buf = new ByteArrayOutputStream() and remove the initialization of buf altogether.

stack added a comment - 09/Dec/08 10:32 PM
> I'm still not convinced we should do more than replace the buf.reset() with buf = new ByteArrayOutputStream() and remove the initialization of buf altogether.

Above is predicated on our running a 'benchmark'. What would you suggest I run?

Why create new objects when it can be avoided (when response < 10k)?


Raghu Angadi added a comment - 09/Dec/08 10:40 PM
yes, other buffering improvements are more work and better suited in a new jira. Btw, we don't need the check 'buf != initialBuffer'.

Couple of advantages to current patch :

  1. we can clearly say this won't change performance for common case. Though I agree that creating new stream (i.e. two more allocations) each time won't be noticeable (except ay be more frequent partial G/C on benchmarks).
  2. Subclassing BAOS (or a new output stream) will be required for other buffering improvements any way.

stack added a comment - 09/Dec/08 10:44 PM
v3 removes the unnecessary equality test (Thanks Raghu)

Raghu Angadi added a comment - 09/Dec/08 10:48 PM
Also, the new class needs to be static.

Subjective : I don't think we should have to use jira numbers in comments. I think this is a specific enough problem that comment or javadoc could explain it in enough detail. Readers don't need to go to jira.


stack added a comment - 09/Dec/08 11:40 PM
Made HBAOS static (Had to move it out of Server#Handler to do so). Removed pointer to this issue from comments (Can go to svn log if really needed).

Raghu Angadi added a comment - 09/Dec/08 11:57 PM
+1 for the patch.

Doug Cutting added a comment - 10/Dec/08 12:41 AM
> Why create new objects when it can be avoided (when response < 10k)?

It's probably a premature optimization. It adds code. Less is more.


stack added a comment - 10/Dec/08 04:57 AM
> It's probably a premature optimization. It adds code. Less is more.

Do the above add up to a -1 on v4 of the patch?


Doug Cutting added a comment - 10/Dec/08 05:58 AM
> Do the above add up to a -1 on v4 of the patch?

No, more like a +0. Without benchmarking its safest to not change things much. Do we have a good pure RPC benchmark?


Edward J. Yoon added a comment - 12/Dec/08 07:15 AM
> stack - 09/Dec/08 12:32 PM
> Patch applies to 0.19 and to TRUNK.

I hope it'll be fixed at the 0.18.3 release.


Raghu Angadi added a comment - 13/Dec/08 10:47 PM
TestRpcCpu micro-benchmark attached to HADOOP-4797 can be used to asses the CPU penalty of this patch for larger buffers. May not matter much since we do expect extra copies with this patch for larger responses.

stack added a comment - 13/Jan/09 07:47 PM
Removed my patches. Write me offline if want to know why.