-
Type:
Bug
-
Status: Open
-
Priority:
Major
-
Resolution: Unresolved
-
Affects Version/s: None
-
Fix Version/s: None
-
Component/s: None
-
Labels:None
When using tika to detect a mime type given only an URL containing ".php" and a content-type hint of "text/html", it guesses "text/x-php", whereas one could expect "text/html".
TikaConfig tika = new TikaConfig(); Metadata metadata = new Metadata(); String url = "https://www.facebook.com/home.php"; metadata.set(Metadata.RESOURCE_NAME_KEY, url); metadata.set(Metadata.CONTENT_TYPE, "text/html"); MediaType type = tika.getDetector().detect(null, metadata); System.out.println(url + " is of type " + type.toString()); // Prints https://www.facebook.com/home.php is of type text/x-php
- links to