Detection of Graphviz document formats could be improved by adding
- either *.dot as glob pattern (conflicts with the more frequent MSWord templates)
- a magic pattern which catches the .dot language grammar, eg. ^\s*(?:strict\s+)?(?:di)?graph\b
Seen with Common Crawl data (see also discussions on user@tika and dev@poi): web server sends "text/vnd.graphviz" (often wrong) and Tika detects "application/msword" (sometimes wrong), see WARC file).