We've seen this failure in several recent builds:
After some investigation, the problem seems to be caused by an unexpected partition leader change which is triggered proactively by the controller when the preferred leader becomes alive again. The test currently assumes that it is safe to restart the broker as soon as it observes a leadership change since this is typically caused by a Zk session timeout. However, in this case, the session hasn't actually expired when the leadership change occurs. So after starting up, the broker sees its brokerId still registered and immediately shuts down, which causes the test failure above. To fix the problem, we should probably have a stronger check to ensure that the broker has actually been deregesitered from Zk prior to restarting.