Details
Description
I would like to report an issue with a resource leak (DFSOutputStream objects) when using the (java) hadoop-hdfs-client
And specifically (at least in my case) when there is a combination of:
- encrypted zones
- quota space exceptions (DSQuotaExceededException)
As you know, when encrypted zones are in play, when calling fs.create(path) in the hadoop-hdfs-client it will return a HdfsDataOutputStream stream object which wraps a CryptoOutputStream object which then wraps a DFSOutputStream object.
Even though my code is correctly calling stream.close() on the above I can see from debugging that the underlying DFSOutputStream objects are being leaked.
Specifically I see the DFSOutputStream objects being leaked in the filesBeingWritten map in DFSClient. (i.e. the DFSOutputStream objects remain in the map even though I've called close() on the stream object).
I suspect this is due to a bug in CryptoOutputStream::close
@Override public synchronized void close() throws IOException { if (closed) { return; } try { flush(); if (closeOutputStream) { super.close(); codec.close(); } freeBuffers(); } finally { closed = true; } }
... whereby if flush() throws (observed in my case when a DSQuotaExceededException exception is thrown due to quota exceeded) then the super.close() on the underlying DFSOutputStream is skipped.
In my case I had a space quota set up on a given directory which is also in an encrypted zone and so each attempt to create and write to a file failed and leaked as above.
I have attached a speculative patch (hadoop_cryto_stream_close_try_finally.diff) which simply wraps the flush() in a try .. finally. The patch resolves the problem in my testing.
Thanks.
Attachments
Attachments
Issue Links
- links to