Bug 53780 - memory, temporary file and file handle leak in SXSSF
Summary: memory, temporary file and file handle leak in SXSSF
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: SXSSF (show other bugs)
Version: 3.8-FINAL
Hardware: All All
: P2 major (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks: 53493
  Show dependency tree
 
Reported: 2012-08-24 22:46 UTC by David Pletcher
Modified: 2012-09-14 14:04 UTC (History)
0 users



Attachments
patch to fix leaks of memory and temporary files (3.25 KB, text/plain)
2012-08-24 22:46 UTC, David Pletcher
Details
patch to fix bug (5.06 KB, patch)
2012-08-27 17:00 UTC, David Pletcher
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description David Pletcher 2012-08-24 22:46:04 UTC
Created attachment 29278 [details]
patch to fix leaks of memory and temporary files

The current SXSSF implementation leaks memory by calling File.deleteOnExit(). It is recommended that the deleteOnExit method should never be called in a persistent application; see discussion here:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4513817

Additionally, temporary files are not cleaned up until the program exits, as noted in this POI bug:
https://issues.apache.org/bugzilla/show_bug.cgi?id=53493

This patch is submitted as an alternative to the fix proposed in the aforementioned bug. I'm reporting a separate bug in order to encompass the additional concern about memory leaks.

Here's how the patch works:
* all calls to deleteOnExit are eliminated.
* the SXSSFWorksheet class walks through all sheets after the output is written, deleting each associated temporary file immediately.
* close and delete operations are moved into finally blocks, ensuring cleanup even when an exception causes premature failure.

These changes are vitally important for our high-throughput, persistent server that will pound out million row Excel spreadsheets all day long. (We've patched the POI library internally but it would be great to factor these changes into the distribution.)

I would volunteer to create unit tests for this improvement but there isn't an existing unit testing framework for the SXSSF component, so far as I can tell. I'm open to contributing on that front in the future, if there's interest.

Thanks!
Comment 1 Alex Geller 2012-08-27 08:20:29 UTC
I think this is a duplicate of bug 53493. All that was said there applies here too. This patch suffers from the same shortcomings as the patch proposed in that bug. I vote for a straightforward solution SXSSFWorkbook.dispose() instead of an implicit destroy-on-write which is not in accordance with the contract for the function SXSSFWorkbook.write(). The File.deleteOnExit() should be removed if it indeed leaks in the scenario it is used in SXSSF (File.deleteOnExit() is needed only if the garbage collector couldn't collect the sheets by the time the program exits. Are you perhaps keeping references to your workbooks or sheets that prevent garbage collection and subsequent temp file deletion?).
Comment 2 David Pletcher 2012-08-27 16:23:09 UTC
Thank you for the quick response.

I reported this as a new issue because it encompasses an additional problem, the memory leak via deleteOnExit. I also included the part about temporary file cleanup (bug 53493) because the problems are entangled; it's convenient to address both symptoms at the same time.

The generally accepted wisdom about finalize methods is not to use them, ever. It isn't a reliable mechanism for doing anything; one should never assume that finalizers will run, and the object graph tends to be seen in an inconsistent state within these methods, often causing weird and unpredictable problems. The patch for 53493 that calls finalize() explicitly is a direct violation of the finalize contract; it must only run once, if ever. See Effective Java or this stackoveflow thread:
http://stackoverflow.com/questions/158174/why-would-you-ever-implement-finalize

Solving the problem with a new dispose method seems reasonable; I'll submit a revised patch today.
Comment 3 David Pletcher 2012-08-27 16:32:04 UTC
Another note: file handles will be leaked if there's an exception during the workbook write process. I'm fixing that problem in the patch as well.
Comment 4 David Pletcher 2012-08-27 17:00:39 UTC
Created attachment 29285 [details]
patch to fix bug

Revised as specified by Alex Geller. Changes since last patch:
* Renamed cleanup methods as dispose
* Moved temporary file cleanup to a separate public dispose method
* Added a unit test
Comment 5 Yegor Kozlov 2012-09-04 13:50:05 UTC
Thanks for the patch, I put it in my TODO list to review and apply. 

Yegor

(In reply to comment #4)
> Created attachment 29285 [details]
> patch to fix bug
> 
> Revised as specified by Alex Geller. Changes since last patch:
> * Renamed cleanup methods as dispose
> * Moved temporary file cleanup to a separate public dispose method
> * Added a unit test
Comment 6 Yegor Kozlov 2012-09-14 14:01:05 UTC
Patch applied in r1384784 with some tweaks. 

SheetDataWriter.dispose must close the corresponding Writer object before deleting the temp file. Without this change your unit test fails on Windows.  

I also tweaked SXSSF unit tests to cleanup temp files: every new instance of SXSSFWorkbook in the tests must be accompanied by workbook.dispose(). We no longer use the trick with File.deleteOnExit and must free the resources explicitly.  

Regards,
Yegor