I have a document that gets stuck in createChp(). Debugging it reveals that in createChp() the baseIndex is the same as the input param, istd and the parentCHP is null, hence it loops approx 1021 times before it gives stack overflow. Is the solution if(baseIndex != NIL_STYLE) { parentCHP = _styleDescriptions[baseIndex].getCHP(); if(parentCHP == null && baseIndex != istd) { createChp(baseIndex); parentCHP = _styleDescriptions[baseIndex].getCHP(); } }
Of course it's not that solution... parentCHP cannot be null. The offending StyleDescription is "Footnote Reference" and has the following attributes in the debugger. _baseLength = 10 _bchUpe = 52 _chp = null _infoShort = 38 _infoShort2 = 370 _infoShort3 = 369 _infoShort4 = 0 _istd = 0 name = "Footnote Reference" _pap = null _upxs = IPX[1] Any ideas on how to fix this?
I would much rather have it throw an exception, so I can handle it, instead of recurse until it crashes the thread.
Adding baseIndex != istd (as Antony mentioned) stops the infinite recursion. It throws a null exception later in uncompressCHP, since parentCHP is null, which is easier to handle--even though it doesn't successfully extract the text. I added this, which made it work for my purposes (text extraction only): if(baseIndex != NIL_STYLE) { parentCHP = _styleDescriptions[baseIndex].getCHP(); if(parentCHP == null && baseIndex != istd) { createChp(baseIndex); parentCHP = _styleDescriptions[baseIndex].getCHP(); } } if(parentCHP != null) { chp = (CharacterProperties)CharacterSprmUncompressor.uncompressCHP(parentCHP, chpx, 0); } Thanks for the useful library, by the way.
Can someone please give me an update on this? We get this problem in POI 3.6 and 3.7 on Linux and are are keen to understand if it is scheduled to be fixed in a particular release? Thanks - Chris
We need someone to figure out if the problem is caused by a bug in our chp decoding, or if it's correctly decoded by just not something we currently support. Once we know that, we can decide if the suggested fix is OK to apply, or if we need to revist the chp decoding code to avoid getting into this situation in the first place HWPF currently lacks a pointman, so if this matters to you, please do investigate and report back!
From what I remember after looking at this issue, it is actually caused by a bad word document. The parent of the style was the same as the style, or something, so it would keep recursing into the child/parent style until it got an overflow. I have no idea how the word documents that gave us issues got into that state--I was going through and extracting text from thousands of documents and this error only happened on a handful (<20 out of 10k). The main issue for me was that it caused my entire web app to crash because of the recursion, so my quick fix resolved that and it's been working fine since. Wes
I think I've added a fix in r998625 that will hopefully switch the broken styles to be standalone. However, as no-one has uploaded a sample broken file, I can't be sure :/ If the problem still remains with the fix, please re-open the bug & upload a file for us to test against!