Bug 47624 - File Error Data May Have been Lost error while opening commented workbook(excel file)
Summary: File Error Data May Have been Lost error while opening commented workbook(exc...
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: HSSF (show other bugs)
Version: 3.2-FINAL
Hardware: Macintosh other
: P1 major with 3 votes (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
: 48327 (view as bug list)
Depends on:
Blocks: 48846 53010
  Show dependency tree
 
Reported: 2009-08-02 12:40 UTC by Reddy
Modified: 2012-08-12 11:49 UTC (History)
2 users (show)



Attachments
Error throwing file attachment (4.50 KB, application/x-msexcel)
2009-08-02 12:40 UTC, Reddy
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Reddy 2009-08-02 12:40:19 UTC
Created attachment 24083 [details]
Error throwing file attachment

When I try to comment already commented workbook and save->open, throws 'file error: data may have been lost' error.

FYI, I have followed same steps as described at http://poi.apache.org/spreadsheet/quick-guide.html#CellComments (using HSSF library)

Repro steps:
1. create a brand new workbook and add cell comment as described in quick-guide
2. write workbook
3. read already commented workbook
4. add another cell with comment
5. write workbook again
6. now when trying to open workbook using MS Excel 2003, 2007(Windows XP pro) & Mac version of 2008, causes following error: 'File error:data may have been lost'.. (I can still open the document tho!)

POI version:
-tried with POI 3.2-FINAL && POI 3.5 beta 6. Same error both times.

I added the error throwing file with defect. (or you can create one by simply following exact (except second time reading existing file) quick-guide guidelines).
Comment 1 nothize 2010-03-18 07:43:34 UTC
I've experienced this problem too.

After some tracing with the 3.6 stable release source, the problem seems to be rely in...

Sheet.java: public int aggregateDrawingRecords(DrawingManager2 drawingManager, boolean createIfMissing).

1491:1499

        EscherAggregate r = EscherAggregate.createAggregate( records, loc, drawingManager );
        int startloc = loc;
        while ( loc + 1 < records.size()
                && records.get( loc ) instanceof DrawingRecord
                && records.get( loc + 1 ) instanceof ObjRecord )
        {
            loc += 2;
        }


where a practical DrawingRecord and ObjRecord pair loop looks like the following inside EscherAggregate:

		while ( loc + 1 < records.size()
				&& sid( records, loc ) == DrawingRecord.sid
				&& isObjectRecord( records, loc + 1 ) )

.

Thus the Sheet's loop for calculating loc will be too early terminated since there could be non-ObjRecord but TextObjRecord that should be taken into consideration too.

I'll try to patch it locally and observe how it goes. The latest development branch seems to refactored alot and renamed Sheet.java to InternalSheet.java, though the code snippet around this issue is still similar.
Comment 2 nothize 2010-03-18 07:47:00 UTC
*** Bug 48327 has been marked as a duplicate of this bug. ***
Comment 3 nothize 2010-03-18 08:44:26 UTC
A simple workaround as a PoC is as below.

Where Invoker is a reflection util to read the private property by force.

-----------------------
	    if ( null == drawing ) {
	    	drawing = sh.createDrawingPatriarch();
	    	// Remove redundant records to avoid error.
	    	
	    	Sheet _sh = (Sheet)Invoker.getProperty(sh, "_sheet");
	    	List list = _sh.getRecords();
	    	for ( Iterator it = list.iterator(); it.hasNext(); ) {
	    		RecordBase e = (RecordBase)it.next();
				if ( e instanceof TextObjectRecord || e instanceof DrawingRecord || e instanceof ObjRecord ) {
					it.remove();
				}
			}
	    }
-----------------------

The idea is that the Sheet.aggregateDrawingRecords(..) called by HSSFSheet.createDrawingPatriarch() fails to remove and also destroyed the pairing of DrawingRecord and *ObjRecord that causing the Excel open file error prompt.

The patch to Sheet.java will follow.
Comment 4 nothize 2010-03-18 09:52:17 UTC
Further testing reveals that NoteRecord is also a factor of the error prompt.

If any of the new comments are on in the same location as the old comments, the redundant NoteRecord will cause Excel to report an error upon opening.

Removing the NoteRecord manually seems to solve this problem.
Comment 5 nothize 2010-03-18 10:41:14 UTC
The issues related are out of my knowledge.

I opt to not do the patch but just derive a workaround for my own case.

For those who want to solve the problem, try removing NoteRecord after Sheet.createDrawingPatriarch(..).

NoteRecord reading and re-writing is not supported in 3.6 either.
Comment 6 Yegor Kozlov 2011-06-25 13:51:23 UTC
It is a limitation of HSSF - comments are graphic objects and HSSF can create drawings from scratch, but cannot
modify existing ones. This means that if you add an comment to a sheet that
already has graphic objects (comments, shapes, pictures, etc.) then the existing graphic objects are invalidated.  

As a workaround, try to output in .xlsx format, it should handle comments across re-saves without problems.

Yegor

*** This bug has been marked as a duplicate of bug 50696 ***
Comment 7 Evgeniy Berlog 2012-08-12 11:49:57 UTC
This problem should be fixed in trunk.

Please try with a nightly build - see download links on http://poi.apache.org/
or build yourself from SVN trunk, see http://poi.apache.org/subversion.html