Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.24.1
-
None
Description
This issue has been reported by a user on discuss.elastic.co.
I can reproduce the problem using the latest version of Tika (1.24.1) in FSCrawler project.
When running the extraction of the data, we are seeing:
java.lang.StackOverflowError: null at java.util.regex.Pattern$BmpCharPredicate.lambda$union$2(Pattern.java:5692) ~[?:?] at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:4019) ~[?:?] at java.util.regex.Pattern$GroupHead.match(Pattern.java:4855) ~[?:?] at java.util.regex.Pattern$BranchConn.match(Pattern.java:4763) ~[?:?] at java.util.regex.Pattern$GroupTail.match(Pattern.java:4886) ~[?:?] at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:4020) ~[?:?] at java.util.regex.Pattern$GroupHead.match(Pattern.java:4855) ~[?:?] at java.util.regex.Pattern$Branch.match(Pattern.java:4800) ~[?:?] at java.util.regex.Pattern$Branch.match(Pattern.java:4798) ~[?:?] at java.util.regex.Pattern$Branch.match(Pattern.java:4798) ~[?:?] at java.util.regex.Pattern$BranchConn.match(Pattern.java:4763) ~[?:?] at java.util.regex.Pattern$GroupTail.match(Pattern.java:4886) ~[?:?] at java.util.regex.Pattern$BmpCharPropertyGreedy.match(Pattern.java:4394) ~[?:?] at java.util.regex.Pattern$GroupHead.match(Pattern.java:4855) ~[?:?] at java.util.regex.Pattern$Branch.match(Pattern.java:4800) ~[?:?] at java.util.regex.Pattern$BranchConn.match(Pattern.java:4763) ~[?:?] at java.util.regex.Pattern$GroupTail.match(Pattern.java:4886) ~[?:?] at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:4020) ~[?:?] at java.util.regex.Pattern$BmpCharPropertyGreedy.match(Pattern.java:4394) ~[?:?] at java.util.regex.Pattern$GroupHead.match(Pattern.java:4855) ~[?:?] at java.util.regex.Pattern$Branch.match(Pattern.java:4800) ~[?:?] at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:4020) ~[?:?] at java.util.regex.Pattern$Start.match(Pattern.java:3673) ~[?:?] at java.util.regex.Matcher.search(Matcher.java:1729) ~[?:?] at java.util.regex.Matcher.find(Matcher.java:773) ~[?:?] at java.util.Formatter.parse(Formatter.java:2702) ~[?:?] at java.util.Formatter.format(Formatter.java:2655) ~[?:?] at java.util.Formatter.format(Formatter.java:2609) ~[?:?] at java.lang.String.format(String.java:3292) ~[?:?] at java.util.logging.SimpleFormatter.format(SimpleFormatter.java:176) ~[?:?] at java.util.logging.StreamHandler.publish(StreamHandler.java:199) ~[?:?] at java.util.logging.ConsoleHandler.publish(ConsoleHandler.java:95) ~[?:?] at java.util.logging.Logger.log(Logger.java:979) ~[?:?] at java.util.logging.Logger.doLog(Logger.java:1006) ~[?:?] at java.util.logging.Logger.logp(Logger.java:1172) ~[?:?] at org.apache.commons.logging.impl.Jdk14Logger.log(Jdk14Logger.java:87) ~[?:?] at org.apache.commons.logging.impl.Jdk14Logger.warn(Jdk14Logger.java:260) ~[?:?] at org.apache.pdfbox.pdmodel.PDPageTree.getKids(PDPageTree.java:159) ~[?:?] at org.apache.pdfbox.pdmodel.PDPageTree.access$200(PDPageTree.java:41) ~[?:?] at org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:183) ~[?:?] at org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:186) ~[?:?] at org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:186) ~[?:?] at org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:186) ~[?:?] at org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:186) ~[?:?] at org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:186) ~[?:?] at org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:186) ~[?:?] at org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:186) ~[?:?] at org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:186) ~[?:?] at org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:186) ~[?:?]
It sounds like related to pdfbox project though but I found that it could be useful to report it here.
Attachments
Attachments
Issue Links
- relates to
-
PDFBOX-5009 Corrupt PDF can lead to a StackOverflow
- Closed