> My understanding of visible length is "the length that all datanodes in the pipeline contain at least such amount of data."
There is no trusted source to obtain such information, unless you keep it in ZooKeeper or want to address the Byzantine Generals' Problem internally, which we don't.
Let me try to explain the notion of visible length.
As per the design doc visible length is the "number of bytes that have been acknowledged by the downstream DataNodes". It is replica (not block) specific, meaning it can be different for different replicas at a given time. In the document it is called BA (bytes acknowledged), compared to BR (bytes received).
If we have 3 replicas: r1, r2, r3 then all of them could have received the same number of bytes:
r1.BR = r2.BR = r3.BR,
but visible lengths are different, because r3 hasn't acknowledged the latest packet to r2 and r1. Until then
r3.BA = r3.BR
r2.BA = r2.BR - p
r1.BA = r1.BR - p
where p is the packet length.
Now when a client reads a byte it first verifies with one of the replicas, suppose it was r3, if the byte is visible. The last-received-byte is visible in r3, and this means the client can read it from any replica. When the client reads the last-received-byte from r1, it sends to r1 the visible length obtained from r3. DN containing r1 realizes that the client has already confirmed with another replica, that the byte was visible there, and lets the client read that byte, even though it is not yet locally visible.
So our consistency guarantee is that after a client had read a byte from one replica that client (or any other knowledgeable of the fact) can read that same byte from any other replica.