While testing spark with hdfs erasure coding (new in hadoop 3), we ran into a bug with the event logs. The main issue was a bug in hdfs (
HDFS-14027), but it did make us wonder whether Spark should be using EC for event log files in general. Its a poor choice because EC currently implements hflush() or hsync() as no-ops, which mean you won't see anything in your event logs until the app is complete. That isn't necessarily a bug, but isn't really great. So I think we should ensure EC is always off for event logs.
IIUC there is not a problem with applications which die without properly closing the outputstream. It'll take a while for the NN to realize the client is gone and finish the block, but the data should get there eventually.
The space savings from EC would be nice as the event logs can get somewhat large, but I think other factors outweigh this.