Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.3.0
-
None
-
None
Description
The default behavior of an HDFS write is to setup a pipeline. A file is broken into packets and sent through the pipeline. Pipelining provides good throughput, but latency suffers.
Allowing a client to specify a fan-out strategy allows the client to send the packets to the DataNodes concurrently instead of passing the packet through a pipeline serially.
# Pipeline C |-------> DN -------> DN -------> DN # Fan Out |-------> DN C |-------> DN |-------> DN
Also, if there's a 'min replication' of, for example, 2. The client only needs to wait for the first 2 ACKs before writing the next packet as long as the 2 ACKs are from different racks. The block placement rules may need to support this.
HBase requires this improved latency.