Details
-
Improvement
-
Status: Closed
-
Blocker
-
Resolution: Fixed
-
None
-
None
-
None
Description
For some typical workloads, the throughput of the namenode is bottlenecked by the rate of transactions that are being logged into tghe edits log. In the current code, a batching scheme implies that all transactions do not have to incur a sync of the edits log to disk. However, the existing batch-ing scheme can be improved.
One option is to keep two buffers associated with edits file. Threads write to the primary buffer while holding the FSNamesystem lock. Then the thread release the FSNamesystem lock, acquires a new lock called the syncLock, swaps buffers, and flushes the old buffer to the persistent store. Since the buffers are swapped, new transactions continue to get logged into the new buffer. (Of course, the new transactions cannot complete before this new buffer is sync-ed).
This approach does a better job of batching syncs to disk, thus improving performance.
Attachments
Attachments
Issue Links
- is related to
-
HADOOP-1874 lost task trackers -- jobs hang
- Closed