Currently DiskIoMgr starts all the I/O threads upfront for all supported filesystems. This means there are 100s of idle threads in most impalads that never do anything. It would be sensible to start the threads for a disk only when the first range is submitted. It's not immediately obvious where the best place to do this is. A couple of ideas:
- Try to do it in ScheduleContext in a lightweight way, e.g. check an atomic to see if it's been initialised, then acquire a lock and create the threads if needed. Propagating the status if thread creation fails may be the tricky part
- Start up one thread per disk, so I/O can always make progress, and start an extra thread per disk each time a range is pulled off the queue in DiskQueue::GetNextRequestRange() so that the number of threads ramps up as scan ranges are submitted. It could potentially be clever and try to track how many threads are parked and only create new threads if 0 threads are parked.