Description
I am using the IdentityMapper in the HTMLparser with this simple document:
<html><head><title> my title </title> </head> <body> <frameset rows=\"20,*\"> <frame src=\"top.html\"> </frame> <frameset cols=\"20,*\"> <frame src=\"left.html\"> </frame> <frame src=\"invalid.html\"/> </frame> <frame src=\"right.html\"> </frame> </frameset> </frameset> </body></html>
Strangely the HTMLHandler is getting a call to endElement on the body BEFORE we reach frameset. As a result the variable bodylevel is decremented back to 0 and the remaining entities are ignored due to the logic implemented in HTMLHandler.
Any idea?
Attachments
Attachments
Issue Links
- is related to
-
TIKA-463 HtmlParser doesn't extract links from img, map, object, frame, iframe, area, link
- Resolved