Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
0.13.1
-
None
-
None
Description
I've done some estimates on how much space our data structures take on the name-node per block, file and directory.
Brief overview of the data structures:
Directory tree (FSDirectory) is built of inodes. Each INode points either to an array of blocks
if it corresponds to a file or to a TreeMap<String, INode> of children INodes if it is a directory.
[Note: this estimates were made before Dhruba replaced the children TreeMap by ArrayList.]
Each block participates also in at least 2 more data structures.
BlocksMap contains a HashMap<Block, BlockInfo> of all blocks mapping a Block into a BlockInfo.
DatanodeDescriptor contains a TreeMap<Block, Block> of all blocks belonging to this data-node.
A block may or may not be contained also in other data-structures, like
UnderReplicatedBlocks PendingReplicationBlocks recentInvalidateSets excessReplicateMap
Presence of a block in any of these structures is temporary and therefore I do not count them in my estimates.
The estimates can be viewed as lower bounds.
These are some classes that we are looking at here
class INode { String name; INode parent; TreeMap<String, INode> children; Block blocks[]; short blockReplication; long modificationTime; } class Block { long blkid; long len; } class BlockInfo { FSDirectory.INode inode; DatanodeDescriptor[] nodes; Block block; }
The calculations are made for a 64-bit java vm based on that
Reference size = 8 bytes
Object header size = 16 bytes
Array header size = 24 bytes
Commonly used objects:
TreeMap.Entry = 64 bytes. It has 5 reference fields
HashMap.Entry = 48 bytes. It has 3 reference fields
String header = 64 bytes.
The size of a file includes:
- Size of an empty file INode: INode.children = null, INode.blocks is a 0-length array, and file name is empty. (152 bytes)
- A directory entry of the parent INode that points to this file, which is a TreeMap.Entry. (64 bytes)
- file name length times 2, because String represents each name character by 2 bytes.
- Reference to the outer FSDirectory class (8 bytes)
The total: 224 + 2 * fileName.length
The size of a directory includes:
- Size of an empty directory INode: INode.children is an empty TreeMap, INode.blocks = null, and file name is empty. (192 bytes)
- A directory entry of the parent INode that points to this file, which is a TreeMap.Entry. (64 bytes)
- file name length times 2
- Reference to the outer FSDirectory class (8 bytes)
The total: 264 + 2 * fileName.length
The size of a block includes:
- Size of Block. (32 bytes)
- Size of BlockInfo. (64 + 8*replication bytes)
- Reference to the block from INode.blocks (8 bytes)
- HashMap.Entry referencing the block from BlocksMap. (48 bytes)
- References to the block from all DatanodeDescriptors it belongs to.
This is a TreeMap.Entry size times block replication. (64 * replication)
The total: 152 + 72 * replication
Typical object sizes:
Taking into account that typical file name is 10-15 bytes and our default replication is 3 we can say that typical sizes are
File size: 250
Directory size: 290
Block size: 368
Object | size estimate (bytes) | typical size (bytes) |
---|---|---|
File | 224 + 2 * fileName.length | 250 |
Directory | 264 + 2 * fileName.length | 290 |
Block | 152 + 72 * replication | 368 |
One of our clusters has
- Files: 10 600 000
- Dirs: 310 000
- Blocks: 13 300 000
Total Size (estimate): 7,63 GB
Memory used on the name-node (actual reported by jconsole after gc): 9 GB
This means that other data structures like NetworkTopology, heartbeats, datanodeMap, Host2NodesMap,
leases, sortedLeases, and multiple block replication maps occupy ~1.4 GB, which seems to be pretty high
and need to be investigated as well.
Based on the above estimates blocks should be the main focus of the name-node memory reduction effort.
Space used by a block is 50% larger compared to a file, and there is more blocks than files or directories.
Some ideas on how we can reduce the name-node size without substantially changing the data structures.
- INode.children should be an ArrayList instead of a TreeMap. Already done
HADOOP-1565. (-48 bytes) - Factor out the INode class into a separate class (-8 bytes)
- Create base INode class and derive file inode and directory inode classes from the base.
Directory inodes do not need to contain blocks and replication fields (-16 bytes)
File inodes do not need to contain children field (-8 bytes) - String name should be replaced by a mere byte[]. (-(40 + fileName.length) ~ -50 bytes)
- Eliminate the Block object.
We should move Block fields into into BlockInfo and completely get rid of the Block object. (-16 bytes) - Block object is referenced at least 5 times in our structures for each physical block.
The number of references should be reduced to just 2. (-24) - Remove name field from INode. File or directory name is stored in the corresponding directory
entry and does need to be duplicated in the INode (-8 bytes) - Eliminate INode.parent field. INodes are accessed through the directory tree, and the parent can
be remembered in a local variable while browsing the tree. There is no need to persistently store
the parent reference for each object. (-8 bytes) - Need to optimize data-node to block map. Currently each DatanodeDescriptor holds a TreeMap of
blocks contained in the node, and we have an overhead of one TreeMap.Entry per block replica.
I expect we can reorganize datanodeMap in a way that it stores only 1 or 2 references per replica
instead of an entire TreeMap.Entry. (-48 * replication)
Note: In general TreeMaps turned out to be very expensive, we should avoid using them if possible.
Or implement a custom map structure, which would avoid using objects for each map entry.
This is what we will have after all the optimizations
Object | size estimate (bytes) | typical size (bytes) | current typical size (bytes) |
---|---|---|---|
File | 112 + fileName.length | 125 | 250 |
Directory | 144 + fileName.length | 155 | 290 |
Block | 112 + 24 * replication | 184 | 368 |
Attachments
Attachments
Issue Links
- incorporates
-
HADOOP-1743 INode refactoring
-
- Closed
-
-
HADOOP-1759 File name should be represented by a byte array instead of a String
-
- Closed
-
-
HADOOP-1766 Merging Block and BlockInfo classes on name-node.
-
- Closed
-