Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
When using tika to detect a mime type given only an URL containing ".php" and a content-type hint of "text/html", it guesses "text/x-php", whereas one could expect "text/html".
TikaConfig tika = new TikaConfig(); Metadata metadata = new Metadata(); String url = "https://www.facebook.com/home.php"; metadata.set(Metadata.RESOURCE_NAME_KEY, url); metadata.set(Metadata.CONTENT_TYPE, "text/html"); MediaType type = tika.getDetector().detect(null, metadata); System.out.println(url + " is of type " + type.toString()); // Prints https://www.facebook.com/home.php is of type text/x-php
Attachments
Issue Links
- links to