|
Konstantin Shvachko made changes - 06/Nov/07 01:55 AM
Rajagopal Natarajan made changes - 22/Nov/07 04:25 PM
[
Permlink
| « Hide
]
Rajagopal Natarajan added a comment - 26/Nov/07 07:07 AM
Before I start writing the code, I just have a query about this improvement. As I could see, maxbytes per checksum is typically 512 bytes by default. By default, block size is 67108864 bytes. This means that if the chunks would be written to sockets directly (as opposed to accumulating chunks and writing out once a full block is ready) with its checksum, we are decreasing the data:header ratio by a big factor, isn't it? Wouldn't this be inefficient? Or am I missing something?
In my initial implementation of
If we want to get rid of extra buffer copies, I would either look in to one these two :
+1
We should not be doing these copies and interleaves if we can avoid them. Can we just memory map the block and then copy the requested chunk it directly to the socket or use other tricks to reduce copies further? (I'm NIO naive) Still +1, but apologies. I was thinking about the read case when I wrote the comment, I think what raghu stated makes more sense without my additions.
Rajagopal, I do not see how the data:header ratio is decreasing here.
This issue is mainly about removing the interleaving buffer layout. Namely, now we partition the original data into chunks,
I propose to change it [back] to
If you add a header before each data and crc chunk then in current approach you will have 2*n headers, while in the proposed This should let us get rid of that extra buffer that is used to collect all the interleaved pieces together. And thus the issue is not about "writing the chunks to the socket directly", but rather about removing chunks all together. Eric, why do you think transferring crc before the data would require less RAM on the client? I think you meant:
I.e., there will be a crc for each chunk in the original data, but the original data will not be broken into chunks. Is that right? Also "the original data (not partitioned into chunks)" does not mean full 128MB right? It is upto something like 64kB or what ever the io buffersize is...
Edit: This is what I mean by 2nd option in 2nd comment above. Yes on both comments.
I'm not sure order actually matters. I can think of arguments for either.
@Konstantin
Apologies. I had misunderstood your proposal, that I thought it is to avoid the use of backupstream and write each chunk-crc pair to socket directly instead of buffering. Now I understand what you meant.
Nigel Daley made changes - 22/Jan/08 07:32 PM
Rajgopal, I am planning to implement this. Let me know if you have already made progress. My approach to to have multiple "checksum chunks" in each DATA_CHUNK as discussed in multiple comments above.
Also I am planning to do read and write side as separate patches. The read side should help with HADOOP-2144.
Hi Raghu, Please go ahead with it. I haven't progressed much. Only started with it, and couldn't get much time after that.
Rajagopal Natarajan made changes - 30/Jan/08 08:18 AM
Hi Raghu,
I hear we were planning to do some rework of the read protocols in 17 to make them similar to the new write protocols. Correct? I would think we would want to coordinate this work with that. This would imply that per client read we would ship the CRCs for all requested bytes followed by all requested bytes for any given client request, right (or data/crc)? It's not clear what you are referring to by buffer in your comment. In the final protocol, I don't think the data node should do any CRC interleaving per client request, do you? Is some of this discussed in another jira? Could you provide a reference if so?
Raghu Angadi made changes - 31/Jan/08 09:27 PM
I chatted with Eric and Konstantin. I think we are on the same page now. Essentially this jira will reduce memory (buffer) copies by rearranging how data is read/written to/from sockets/disk.
I filed a follow up jira
Raghu Angadi made changes - 14/May/08 06:39 AM
Doug Cutting made changes - 16/Sep/08 05:32 PM
Owen O'Malley made changes - 08/Jul/09 04:42 PM
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||