Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.17
-
None
Description
Hyperlinks in a HTML document that are parsed via tika server:
curl -X PUT --upload-file tika_adds_shape_to_hyperlink.html http://localhost:9998/tika --header "Accept: text/html"
sent:
<div>
<a href="http://www.google.com">http://www.google.com</a>
</div>
received back:
<a shape="rect" href="http://www.google.com">http://www.google.com</a>
Divs are are gone and a shape has been added
Attachments
Attachments
Issue Links
- is related to
-
TIKA-1599 Switch from TagSoup to JSoup
- Resolved