[TIKA-2169] Fix xhtml in combination OCR+metadata extraction from images - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.15, 2.0.0
Component/s: None
Labels:
None

Description

In trunk, I'm getting an embedded html entity for the image's metadata when Tesseract is available:
<html>
ocr content
<html>
...metadata
</html>
</html>

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Tim Allison

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 07/Nov/16 12:42

Updated:: 12/Apr/21 12:58

Resolved:: 28/Nov/16 15:42