If there is profiling enabled for mapper or reducer then hprof dumps profile.out at process exit. It is dumped after task signaled to AM that work is finished.
AM kills container with finished work without waiting for hprof to finish dumps. If hprof is dumping larger outputs (such as with depth=4 while depth=3 works) , it could not finish dump in time before being killed making entire dump unusable because cpu and heap stats are missing.
There needs to be better delay before container is killed if profiling is enabled.