The initial cacheReport patch at
HDFS-5051 does frequent full reports of DN cache state. Better would be a scheme similar to how block reports are currently done: send incremental cache reports on every heartbeat (seconds), and full reports on a longer time scale (minutes to hours). This should reduce network traffic and allow us to make incremental reports even faster.
As per discussion on
HDFS-5051, we should also roll-up the following review comments:
- Remove gen stamp and length from cacheReport, unnecessary until we do auto-caching of appended data
- Only jitter full cache reports, similar to how full block reports are jittered
- On DN startup, skip all cache reports until the cache is populated. The NN can just assume the DN cache is empty in the meantime.