Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
In my XML file, i has <body> tag in it. So detect() method of tika recognizes the file and gives the content type as "text/html" instead of xml. Note: File name doesn't have file extension.
Example: Xml file looks like below format.
<?xml version="1.0"?>
<body>
<a>
<b></b>
<c></c>
</a>
</body>
</xml>
Is there any other method or approach available to detect this file as xml format instead of html.
Thank you in advance
Attachments
Issue Links
- duplicates
-
TIKA-1842 XML file detected as HTML
- Open