The way we store data on disk is akin to striping or RAID 0, losing one disk means that the rest of the data isn't recoverable on the other disks.
Users would see something like after replacing a bad disk:
The above shows a tablet server figuring out that one folder is empty, but then that other folders have data so it crashes. Currently the workaround is to manually delete the data in all the remaining Kudu folders.
As we fix this, one thing to keep in mind is that WALs can only be stored on one disk, so even if we tolerate data disk failures it would still not help if the WALs' disk dies.