HDFS uses file-lease to manage opened files, when a file is not closed normally, NN will recover lease automatically after hard limit exceeded. But for a long running service(e.g. HBase), the hdfs-client will never die and NN don't have any chances to recover the file.
Usually client program needs to handle exceptions by themself to avoid this condition(e.g. HBase automatically call recover lease for files that not closed normally), but in our experience, most services (in our company) don't process this condition properly, which will cause lots of files in abnormal status or even data loss.
This Jira propose to add a feature that call recoverLease operation automatically when DFSOutputSteam close encounters exception. It should be disabled by default, but when somebody builds a long-running service based on HDFS, they can enable this option.
We've add this feature to our internal Hadoop distribution for more than 3 years, it's quite useful according our experience.