Yeah a corrupted offset file would lead to this (but could also be some other bug). We do shut down the broker on any I/O error (as that means we don't know the state of the data on disk and need to run recovery). Do you have the log from that previous shutdown?
If the offset checkpoint is corrupt I think the desired behavior is for the node to crash. So in that case I think the problem is that we throw that number format exception which we probably don't handle right instead of IOException which would cause the broker to shoot itself in the head.
Let's do this: I'll fix the parsing logic on trunk so that any unparsable file throws IOException. This will let us gracefully handle corruption in the file. I'm still not convinced that this is a file corruption thing and not just some bug in our code, but without the actual file it's a little hard to know. If you can reproduce it on another machine that proves it is a bug--if so grab the file, I suspect it will give a clue what is going on.