
|
If you were logged in you would be able to see more operations.
|
|
|
|
File Attachments:
|
|
|
Issue Links:
|
Incorporates
|
|
This issue incorporates:
|
|
|
|
|
 |
HADOOP-1766
Merging Block and BlockInfo classes on name-node.
|
|
|
|
|
HADOOP-1759
File name should be represented by a byte array instead of a String
|
|
|
|
|
|
|
|
| Resolution Date: |
07/Sep/07 08:55 PM
|
I've done some estimates on how much space our data structures take on the name-node per block, file and directory.
Brief overview of the data structures:
Directory tree (FSDirectory) is built of inodes. Each INode points either to an array of blocks
if it corresponds to a file or to a TreeMap<String, INode> of children INodes if it is a directory.
[Note: this estimates were made before Dhruba replaced the children TreeMap by ArrayList.]
Each block participates also in at least 2 more data structures.
BlocksMap contains a HashMap<Block, BlockInfo> of all blocks mapping a Block into a BlockInfo.
DatanodeDescriptor contains a TreeMap<Block, Block> of all blocks belonging to this data-node.
A block may or may not be contained also in other data-structures, like
UnderReplicatedBlocks
PendingReplicationBlocks
recentInvalidateSets
excessReplicateMap
Presence of a block in any of these structures is temporary and therefore I do not count them in my estimates.
The estimates can be viewed as lower bounds.
These are some classes that we are looking at here
class INode {
String name;
INode parent;
TreeMap<String, INode> children;
Block blocks[];
short blockReplication;
long modificationTime;
}
class Block {
long blkid;
long len;
}
class BlockInfo {
FSDirectory.INode inode;
DatanodeDescriptor[] nodes;
Block block;
}
The calculations are made for a 64-bit java vm based on that
Reference size = 8 bytes
Object header size = 16 bytes
Array header size = 24 bytes
Commonly used objects:
TreeMap.Entry = 64 bytes. It has 5 reference fields
HashMap.Entry = 48 bytes. It has 3 reference fields
String header = 64 bytes.
The size of a file includes:
- Size of an empty file INode: INode.children = null, INode.blocks is a 0-length array, and file name is empty. (152 bytes)
- A directory entry of the parent INode that points to this file, which is a TreeMap.Entry. (64 bytes)
- file name length times 2, because String represents each name character by 2 bytes.
- Reference to the outer FSDirectory class (8 bytes)
The total: 224 + 2 * fileName.length
The size of a directory includes:
- Size of an empty directory INode: INode.children is an empty TreeMap, INode.blocks = null, and file name is empty. (192 bytes)
- A directory entry of the parent INode that points to this file, which is a TreeMap.Entry. (64 bytes)
- file name length times 2
- Reference to the outer FSDirectory class (8 bytes)
The total: 264 + 2 * fileName.length
The size of a block includes:
- Size of Block. (32 bytes)
- Size of BlockInfo. (64 + 8*replication bytes)
- Reference to the block from INode.blocks (8 bytes)
- HashMap.Entry referencing the block from BlocksMap. (48 bytes)
- References to the block from all DatanodeDescriptors it belongs to.
This is a TreeMap.Entry size times block replication. (64 * replication)
The total: 152 + 72 * replication
Typical object sizes:
Taking into account that typical file name is 10-15 bytes and our default replication is 3 we can say that typical sizes are
File size: 250
Directory size: 290
Block size: 368
| Object |
size estimate (bytes) |
typical size (bytes) |
| File |
224 + 2 * fileName.length |
250 |
| Directory |
264 + 2 * fileName.length |
290 |
| Block |
152 + 72 * replication |
368 |
One of our clusters has
- Files: 10 600 000
- Dirs: 310 000
- Blocks: 13 300 000
Total Size (estimate): 7,63 GB
Memory used on the name-node (actual reported by jconsole after gc): 9 GB
This means that other data structures like NetworkTopology, heartbeats, datanodeMap, Host2NodesMap,
leases, sortedLeases, and multiple block replication maps occupy ~1.4 GB, which seems to be pretty high
and need to be investigated as well.
Based on the above estimates blocks should be the main focus of the name-node memory reduction effort.
Space used by a block is 50% larger compared to a file, and there is more blocks than files or directories.
Some ideas on how we can reduce the name-node size without substantially changing the data structures.
- INode.children should be an ArrayList instead of a TreeMap. Already done
HADOOP-1565. (-48 bytes)
- Factor out the INode class into a separate class (-8 bytes)
- Create base INode class and derive file inode and directory inode classes from the base.
Directory inodes do not need to contain blocks and replication fields (-16 bytes)
File inodes do not need to contain children field (-8 bytes)
- String name should be replaced by a mere byte[]. (-(40 + fileName.length) ~ -50 bytes)
- Eliminate the Block object.
We should move Block fields into into BlockInfo and completely get rid of the Block object. (-16 bytes)
- Block object is referenced at least 5 times in our structures for each physical block.
The number of references should be reduced to just 2. (-24)
- Remove name field from INode. File or directory name is stored in the corresponding directory
entry and does need to be duplicated in the INode (-8 bytes)
- Eliminate INode.parent field. INodes are accessed through the directory tree, and the parent can
be remembered in a local variable while browsing the tree. There is no need to persistently store
the parent reference for each object. (-8 bytes)
- Need to optimize data-node to block map. Currently each DatanodeDescriptor holds a TreeMap of
blocks contained in the node, and we have an overhead of one TreeMap.Entry per block replica.
I expect we can reorganize datanodeMap in a way that it stores only 1 or 2 references per replica
instead of an entire TreeMap.Entry. (-48 * replication)
Note: In general TreeMaps turned out to be very expensive, we should avoid using them if possible.
Or implement a custom map structure, which would avoid using objects for each map entry.
This is what we will have after all the optimizations
| Object |
size estimate (bytes) |
typical size (bytes) |
current typical size (bytes) |
| File |
112 + fileName.length |
125 |
250 |
| Directory |
144 + fileName.length |
155 |
290 |
| Block |
112 + 24 * replication |
184 |
368 |
|
|
Description
|
I've done some estimates on how much space our data structures take on the name-node per block, file and directory.
Brief overview of the data structures:
Directory tree (FSDirectory) is built of inodes. Each INode points either to an array of blocks
if it corresponds to a file or to a TreeMap<String, INode> of children INodes if it is a directory.
[Note: this estimates were made before Dhruba replaced the children TreeMap by ArrayList.]
Each block participates also in at least 2 more data structures.
BlocksMap contains a HashMap<Block, BlockInfo> of all blocks mapping a Block into a BlockInfo.
DatanodeDescriptor contains a TreeMap<Block, Block> of all blocks belonging to this data-node.
A block may or may not be contained also in other data-structures, like
UnderReplicatedBlocks
PendingReplicationBlocks
recentInvalidateSets
excessReplicateMap
Presence of a block in any of these structures is temporary and therefore I do not count them in my estimates.
The estimates can be viewed as lower bounds.
These are some classes that we are looking at here
class INode {
String name;
INode parent;
TreeMap<String, INode> children;
Block blocks[];
short blockReplication;
long modificationTime;
}
class Block {
long blkid;
long len;
}
class BlockInfo {
FSDirectory.INode inode;
DatanodeDescriptor[] nodes;
Block block;
}
The calculations are made for a 64-bit java vm based on that
Reference size = 8 bytes
Object header size = 16 bytes
Array header size = 24 bytes
Commonly used objects:
TreeMap.Entry = 64 bytes. It has 5 reference fields
HashMap.Entry = 48 bytes. It has 3 reference fields
String header = 64 bytes.
The size of a file includes:
- Size of an empty file INode: INode.children = null, INode.blocks is a 0-length array, and file name is empty. (152 bytes)
- A directory entry of the parent INode that points to this file, which is a TreeMap.Entry. (64 bytes)
- file name length times 2, because String represents each name character by 2 bytes.
- Reference to the outer FSDirectory class (8 bytes)
The total: 224 + 2 * fileName.length
The size of a directory includes:
- Size of an empty directory INode: INode.children is an empty TreeMap, INode.blocks = null, and file name is empty. (192 bytes)
- A directory entry of the parent INode that points to this file, which is a TreeMap.Entry. (64 bytes)
- file name length times 2
- Reference to the outer FSDirectory class (8 bytes)
The total: 264 + 2 * fileName.length
The size of a block includes:
- Size of Block. (32 bytes)
- Size of BlockInfo. (64 + 8*replication bytes)
- Reference to the block from INode.blocks (8 bytes)
- HashMap.Entry referencing the block from BlocksMap. (48 bytes)
- References to the block from all DatanodeDescriptors it belongs to.
This is a TreeMap.Entry size times block replication. (64 * replication)
The total: 152 + 72 * replication
Typical object sizes:
Taking into account that typical file name is 10-15 bytes and our default replication is 3 we can say that typical sizes are
File size: 250
Directory size: 290
Block size: 368
| Object |
size estimate (bytes) |
typical size (bytes) |
| File |
224 + 2 * fileName.length |
250 |
| Directory |
264 + 2 * fileName.length |
290 |
| Block |
152 + 72 * replication |
368 |
One of our clusters has
- Files: 10 600 000
- Dirs: 310 000
- Blocks: 13 300 000
Total Size (estimate): 7,63 GB
Memory used on the name-node (actual reported by jconsole after gc): 9 GB
This means that other data structures like NetworkTopology, heartbeats, datanodeMap, Host2NodesMap,
leases, sortedLeases, and multiple block replication maps occupy ~1.4 GB, which seems to be pretty high
and need to be investigated as well.
Based on the above estimates blocks should be the main focus of the name-node memory reduction effort.
Space used by a block is 50% larger compared to a file, and there is more blocks than files or directories.
Some ideas on how we can reduce the name-node size without substantially changing the data structures.
- INode.children should be an ArrayList instead of a TreeMap. Already done
HADOOP-1565. (-48 bytes)
- Factor out the INode class into a separate class (-8 bytes)
- Create base INode class and derive file inode and directory inode classes from the base.
Directory inodes do not need to contain blocks and replication fields (-16 bytes)
File inodes do not need to contain children field (-8 bytes)
- String name should be replaced by a mere byte[]. (-(40 + fileName.length) ~ -50 bytes)
- Eliminate the Block object.
We should move Block fields into into BlockInfo and completely get rid of the Block object. (-16 bytes)
- Block object is referenced at least 5 times in our structures for each physical block.
The number of references should be reduced to just 2. (-24)
- Remove name field from INode. File or directory name is stored in the corresponding directory
entry and does need to be duplicated in the INode (-8 bytes)
- Eliminate INode.parent field. INodes are accessed through the directory tree, and the parent can
be remembered in a local variable while browsing the tree. There is no need to persistently store
the parent reference for each object. (-8 bytes)
- Need to optimize data-node to block map. Currently each DatanodeDescriptor holds a TreeMap of
blocks contained in the node, and we have an overhead of one TreeMap.Entry per block replica.
I expect we can reorganize datanodeMap in a way that it stores only 1 or 2 references per replica
instead of an entire TreeMap.Entry. (-48 * replication)
Note: In general TreeMaps turned out to be very expensive, we should avoid using them if possible.
Or implement a custom map structure, which would avoid using objects for each map entry.
This is what we will have after all the optimizations
| Object |
size estimate (bytes) |
typical size (bytes) |
current typical size (bytes) |
| File |
112 + fileName.length |
125 |
250 |
| Directory |
144 + fileName.length |
155 |
290 |
| Block |
112 + 24 * replication |
184 |
368 |
|
Show » |
made changes - 06/Aug/07 09:48 PM
| Field |
Original Value |
New Value |
|
Description
|
I've done some estimates on how much space our data structures take on the name-node per block, file and directory.
Brief overview of the data structures:
Directory tree (FSDirectory) is built of inodes. Each INode points either to an array of blocks
if it corresponds to a file or to a TreeMap<String, INode> of children INodes if it is a directory.
[Note: this estimates were made before Dhruba replaced the children TreeMap by ArrayList.]
Each block participates also in at least 2 more data structures.
BlocksMap contains a HashMap<Block, BlockInfo> of all blocks mapping a Block into a BlockInfo.
DatanodeDescriptor contains a TreeMap<Block, Block> of all blocks belonging to this data-node.
A block may or may not be contained also in other data-structures, like
{code}
UnderReplicatedBlocks
PendingReplicationBlocks
recentInvalidateSets
excessReplicateMap
{code}
Presence of a block in any of these structures is temporary and therefore I do not count them in my estimates.
The estimates can be viewed as lower bounds.
These are some classes that we are looking at here
{code}
class INode {
String name;
INode parent;
TreeMap<String, INode> children;
Block blocks[];
short blockReplication;
long modificationTime;
}
class Block {
long blkid;
long len;
}
class BlockInfo {
FSDirectory.INode inode;
DatanodeDescriptor[] nodes;
Block block;
}
{code}
The calculations are made for a 64-bit java vm based on that
Reference size = 8 bytes
Object header size = 16 bytes
Array header size = 24 bytes
Commonly used objects:
TreeMap.Entry = 64 bytes. It has 5 reference fields
HashMap.Entry = 48 bytes. It has 3 reference fields
String header = 64 bytes.
The size of a file includes:
# Size of an empty file INode: INode.children = null, INode.blocks is a 0-length array, and file name is empty. (152 bytes)
# A directory entry of the parent INode that points to this file, which is a TreeMap.Entry. (64 bytes)
# file name length times 2, because String represents each name character by 2 bytes.
# Reference to the outer FSDirectory class (8 bytes)
The total: 224 + 2 * fileName.length
The size of a directory includes:
# Size of an empty directory INode: INode.children is an empty TreeMap, INode.blocks = null, and file name is empty. (192 bytes)
# A directory entry of the parent INode that points to this file, which is a TreeMap.Entry. (64 bytes)
# file name length times 2
# Reference to the outer FSDirectory class (8 bytes)
The total: 264 + 2 * fileName.length
The size of a block includes:
# Size of Block. (32 bytes)
# Size of BlockInfo. (64 + 8*replication bytes)
# Reference to the block from INode.blocks (8 bytes)
# HashMap.Entry referencing the block from BlocksMap. (48 bytes)
# References to the block from all DatanodeDescriptors it belongs to.
This is a TreeMap.Entry size times block replication. (64 * replication)
The total: 152 + 72 * replication
Typical object sizes:
Taking into account that typical file name is 10-15 bytes and our default replication is 3 we can say that typical sizes are
File size: 250
Directory size: 290
Block size: 368
||Object||size estimate (bytes)||typical size (bytes)||
|File|224 + 2 * fileName.length|250|
|Directory|264 + 2 * fileName.length|290|
|Block|152 + 72 * replication|368|
One of our clusters has
# Files: 10 600 000
# Dirs: 310 000
# Blocks: 13 300 000
Total Size (estimate): 7,63 GB
Memory used on the name-node (actual reported by jconsole after gc): 9 GB
This means that other data structures like NetworkTopology, heartbeats, datanodeMap, Host2NodesMap,
leases, sortedLeases, and multiple block replication maps occupy ~1.4 GB, which seems to be pretty high
and need to be investigated as well.
Based on the above estimates blocks should be the main focus of the name-node memory reduction effort.
Space used by a block is 50% larger compared to a file, and there is more blocks than files or directories.
Some ideas on how we can reduce the name-node size without substantially changing the data structures.
# INode.children should be an ArrayList instead of a TreeMap. Already done HADOOP-1565. (-48 bytes)
# Factor out the INode class into a separate class (-8 bytes)
# Create base INode class and derive file inode and directory inode classes from the base.
Directory inodes do not need to contain blocks and replication fields (-16 bytes)
File inodes do not need to contain children field (-8 bytes)
# String name should be replaced by a mere byte[]. (-(40 + fileName.length) ~ -50 bytes)
# Eliminate the Block object.
We should move Block fields into into BlockInfo and completely get rid of the Block object. (-16 bytes)
# Block object is referenced at least 5 times in our structures for each physical block.
The number of references should be reduced to just 2. (-24)
# Remove name field from INode. File or directory name is stored in the corresponding directory
entry and does need to be duplicated in the INode (-8 bytes)
# Eliminate INode.parent field. INodes are accessed through the directory tree, and the parent can
be remembered in a local variable while browsing the tree. There is no need to persistently store
the parent reference for each object. (-8 bytes)
# Need to optimize data-node to block map. Currently each DatanodeDescriptor holds a TreeMap of
blocks contained in the node, and we have an overhead of one TreeMap.Entry per block replica.
I expect we can reorganize datanodeMap in a way that it stores only 1 or 2 references per replica
instead of an entire TreeMap.Entry. (-48 * replication)
Note: In general TreeMaps turned out to be very expensive, we should avoid using them if possible.
Or implement a custom map structure, which would avoid using objects for each map entry.
This is what we will have after all the optimizations
||Object||size estimate (bytes)||typical size (bytes)||current typical size (bytes)||
|File|112 + fileName.length|125|250|
|Directory|144 + fileName.length|155|290|
|Block|112 + 24 * replication|184|368|
|
I've done some estimates on how much space our data structures take on the name-node per block, file and directory.
Brief overview of the data structures:
Directory tree (FSDirectory) is built of inodes. Each INode points either to an array of blocks
if it corresponds to a file or to a TreeMap<String, INode> of children INodes if it is a directory.
[Note: this estimates were made before Dhruba replaced the children TreeMap by ArrayList.]
Each block participates also in at least 2 more data structures.
BlocksMap contains a HashMap<Block, BlockInfo> of all blocks mapping a Block into a BlockInfo.
DatanodeDescriptor contains a TreeMap<Block, Block> of all blocks belonging to this data-node.
A block may or may not be contained also in other data-structures, like
{code}
UnderReplicatedBlocks
PendingReplicationBlocks
recentInvalidateSets
excessReplicateMap
{code}
Presence of a block in any of these structures is temporary and therefore I do not count them in my estimates.
The estimates can be viewed as lower bounds.
These are some classes that we are looking at here
{code}
class INode {
String name;
INode parent;
TreeMap<String, INode> children;
Block blocks[];
short blockReplication;
long modificationTime;
}
class Block {
long blkid;
long len;
}
class BlockInfo {
FSDirectory.INode inode;
DatanodeDescriptor[] nodes;
Block block;
}
{code}
The calculations are made for a 64-bit java vm based on that
Reference size = 8 bytes
Object header size = 16 bytes
Array header size = 24 bytes
Commonly used objects:
TreeMap.Entry = 64 bytes. It has 5 reference fields
HashMap.Entry = 48 bytes. It has 3 reference fields
String header = 64 bytes.
The size of a file includes:
# Size of an empty file INode: INode.children = null, INode.blocks is a 0-length array, and file name is empty. (152 bytes)
# A directory entry of the parent INode that points to this file, which is a TreeMap.Entry. (64 bytes)
# file name length times 2, because String represents each name character by 2 bytes.
# Reference to the outer FSDirectory class (8 bytes)
The total: 224 + 2 * fileName.length
The size of a directory includes:
# Size of an empty directory INode: INode.children is an empty TreeMap, INode.blocks = null, and file name is empty. (192 bytes)
# A directory entry of the parent INode that points to this file, which is a TreeMap.Entry. (64 bytes)
# file name length times 2
# Reference to the outer FSDirectory class (8 bytes)
The total: 264 + 2 * fileName.length
The size of a block includes:
# Size of Block. (32 bytes)
# Size of BlockInfo. (64 + 8*replication bytes)
# Reference to the block from INode.blocks (8 bytes)
# HashMap.Entry referencing the block from BlocksMap. (48 bytes)
# References to the block from all DatanodeDescriptors it belongs to.
This is a TreeMap.Entry size times block replication. (64 * replication)
The total: 152 + 72 * replication
Typical object sizes:
Taking into account that typical file name is 10-15 bytes and our default replication is 3 we can say that typical sizes are
File size: 250
Directory size: 290
Block size: 368
||Object||size estimate (bytes)||typical size (bytes)||
|File|224 + 2 * fileName.length|250|
|Directory|264 + 2 * fileName.length|290|
|Block|152 + 72 * replication|368|
One of our clusters has
# Files: 10 600 000
# Dirs: 310 000
# Blocks: 13 300 000
Total Size (estimate): 7,63 GB
Memory used on the name-node (actual reported by jconsole after gc): 9 GB
This means that other data structures like NetworkTopology, heartbeats, datanodeMap, Host2NodesMap,
leases, sortedLeases, and multiple block replication maps occupy ~1.4 GB, which seems to be pretty high
and need to be investigated as well.
Based on the above estimates blocks should be the main focus of the name-node memory reduction effort.
Space used by a block is 50% larger compared to a file, and there is more blocks than files or directories.
Some ideas on how we can reduce the name-node size without substantially changing the data structures.
# INode.children should be an ArrayList instead of a TreeMap. Already done HADOOP-1565. (-48 bytes)
# Factor out the INode class into a separate class (-8 bytes)
# Create base INode class and derive file inode and directory inode classes from the base.
Directory inodes do not need to contain blocks and replication fields (-16 bytes)
File inodes do not need to contain children field (-8 bytes)
# String name should be replaced by a mere byte[]. (-(40 + fileName.length) ~ -50 bytes)
# Eliminate the Block object.
We should move Block fields into into BlockInfo and completely get rid of the Block object. (-16 bytes)
# Block object is referenced at least 5 times in our structures for each physical block.
The number of references should be reduced to just 2. (-24)
# Remove name field from INode. File or directory name is stored in the corresponding directory
entry and does need to be duplicated in the INode (-8 bytes)
# Eliminate INode.parent field. INodes are accessed through the directory tree, and the parent can
be remembered in a local variable while browsing the tree. There is no need to persistently store
the parent reference for each object. (-8 bytes)
# Need to optimize data-node to block map. Currently each DatanodeDescriptor holds a TreeMap of
blocks contained in the node, and we have an overhead of one TreeMap.Entry per block replica.
I expect we can reorganize datanodeMap in a way that it stores only 1 or 2 references per replica
instead of an entire TreeMap.Entry. (-48 * replication)
Note: In general TreeMaps turned out to be very expensive, we should avoid using them if possible.
Or implement a custom map structure, which would avoid using objects for each map entry.
This is what we will have after all the optimizations
||Object||size estimate (bytes)||typical size (bytes)||current typical size (bytes)||
|File|112 + fileName.length|125|250|
|Directory|144 + fileName.length|155|290|
|Block|112 + 24 * replication|184|368|
|
made changes - 01/Sep/07 03:50 PM
|
Status
|
Open
[ 1
]
|
Patch Available
[ 10002
]
|
made changes - 05/Sep/07 12:53 AM
|
Status
|
Patch Available
[ 10002
]
|
Open
[ 1
]
|
made changes - 05/Sep/07 12:56 AM
|
Status
|
Open
[ 1
]
|
Patch Available
[ 10002
]
|
made changes - 06/Sep/07 09:07 PM
|
Status
|
Patch Available
[ 10002
]
|
Open
[ 1
]
|
made changes - 06/Sep/07 09:08 PM
|
Status
|
Open
[ 1
]
|
Patch Available
[ 10002
]
|
made changes - 07/Sep/07 08:55 PM
|
Resolution
|
|
Fixed
[ 1
]
|
|
Status
|
Patch Available
[ 10002
]
|
Resolved
[ 5
]
|
made changes - 05/Nov/07 06:12 PM
|
Status
|
Resolved
[ 5
]
|
Closed
[ 6
]
|
made changes - 08/Jul/09 04:42 PM
|
Component/s
|
dfs
[ 12310710
]
|
|
|
Block map optimization.
Block map (implemented as a HashMap) is a mapping of blocks to data-nodes that contain this block.
Currently a block map entry BlockInfo contains an array of references to DatanodeDescriptors corresponding to the data-nodes this block is replicated to.
Each DatanodeDescriptors in turn contains a collection (implemented as a TreeMap) of references to all blocks that belong to the data-node forming a data-node map.
The TreeMap entries in DatanodeDescriptors are the main contributors to the space overhead the system bears for blocks.
The block map and the data-node map are to be kept consistent at all times with respect to following three operations:
Add/ remove block are individual block operations and they happen often.
The block report is a fairly rare operation but involves processing of all blocks belonging to the data-node. A block report is an array of blocks received by the name-node from the data-node. The name-node scans the array and verifies that its current knowledge of the data-node blocks matches the actual state reported.
I propose the following changes to the existing data structures.
When a block is removed from the block map we re-link adjacent BlockInfo entries for each data-node the block belongs to.
Addition of a block should include it into the corresponding data-node lists. We always include new blocks into the beginning of the list.
When the name-node receives a block report it as usually locks the whole name system so that other updates could not intervene and starts processing the report one block at a time. Before processing the first block it inserts a fake BlockInfo entry into the head of the block list, which will serve as a delimiter between the blocks that have been reported by the data-node and those that were not.
For each block b in the block report we check whether it is contained in the block map. If yes we move it to the head of this data-node block list. If not then we schedule it for removal from this data-node.
When the whole report is processed the blocks that were considered by the name-node belonging to this data-node but turned out be missing there will remain to the right of the delimiting (fake) element initially inserted into the list. The name-node needs either to schedule them for re-replication if insufficient number replicas is present or otherwise just remove the blocks (along with the delimiting entry) from the list corresponding to the reporting data-node.
The BlockInfo structure will be as follows:
BlockInfo { private INodeFile inode; private DatanodeDescriptor[replication] nodes; private BlockInfo[2*replication] neighbors; }assuming that neighbors[2*i] and neighbors[2*i+1] are the previous and the next references respectively, corresponding to the data-node nodes[i].
Space overhead per block is now limited to:
That is 32 + 16*replication total.
Versus 64*replication (see "the size of a block" (5) above).
In a typical case (replication = 3) this saves us 192 - 80 = 112.
Optional further optimization would be to merge two arrays of DatanodeDescriptors and the previous/next array in BlockInfo into one Object[] array of size 3*replication such that for the i-th data-node Object[3i] is the DatanodeDescriptor reference and Object[3i+1] and Object[3i+2] are the respective references to previous and next.
I am not quite sure we should go this way but if we do we will save additional 32 bytes per block.