A couple different use cases cause storms of reads to META during startup. For example, a large MR job will cause each map task to hit meta since it starts with an empty cache.
A couple possible improvements have been proposed:
- MR jobs could ship a copy of META for the table in the DistributedCache
- Clients could prewarm cache by doing a large scan of all the meta for the table instead of random reads for each miss
- Each miss could fetch ahead some number of rows in META