Bug 53810 - [PATCH] fix for incorrect loop detection in NPOIFS
Summary: [PATCH] fix for incorrect loop detection in NPOIFS
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: POIFS (show other bugs)
Version: 3.8-FINAL
Hardware: PC Mac OS X 10.4
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-09-01 00:19 UTC by Gary King
Modified: 2013-02-04 12:52 UTC (History)
0 users



Attachments
patch fixing cycle detection in NPOI (1.68 KB, patch)
2012-09-01 00:21 UTC, Gary King
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Gary King 2012-09-01 00:19:28 UTC
While upgrading our application to use Tika 1.2 (previously Tika 0.9), a few PowerPoint 97-03 (PPT) files which previously parsed correctly started failing with exceptions in NPOIFS.

The root cause appears to be a difference in the way that BAT entries are read from XBAT blocks between POIFSFileSystem and NPOIFSFileSystem. In POIFS, the header's getBATCount is used as a hard-limit for the number of BATs which are read; in NPOIFS, XBATEntriesPerBlock are read for every XBAT, even if this causes more total BAT entries to be read than header.getBATCount. In some files, the extraneous BAT blocks are all initialized to the same value, which is then detected as a possible cycle.

The attached PPT file demonstrates this problem (it was found via a web-crawler search for test content, so I can not grant a license to Apache to redistribute it). The attached patch implements similar behavior in NPOIFS to what exists in POIFS, and allows the file to parse without exception.
Comment 1 Gary King 2012-09-01 00:21:58 UTC
Created attachment 29315 [details]
patch fixing cycle detection in NPOI
Comment 2 Gary King 2012-09-01 00:34:09 UTC
Bugzilla isn't letting me upload the file; however, the file may be downloaded from http://www.slideshare.net/jbrenman/thirst.
Comment 3 Nick Burch 2013-02-04 12:52:56 UTC
Thanks for this, slightly modified version committed in r1442095. With that in place, I can now process that slideshare file without problems.