Description
What steps will reproduce the problem?
1. paste the content of this file into any23.org
2. press extract
What is the expected output? What do you see instead?
triples
but instead you will see
No suitable extractor found for this media type
What version of the product are you using?
0.6.1
Please provide any additional information below.
So the problem is if there is a long comment at the top of the file
If you repeat the operation but delete the last word "sections" from the first line then it works fine
The proposed solution:
It might be worth to do
If no suitable extractor were found at the first place
try to remove blank lines and turtle style comments
from the source
skip line if it match
line.matches("^
s+$") // remove empty line
or
line.matches("^
s*#.*$")// remove line which starts with # or white space and #
and then check for turtle mime type again
Attachments
Attachments
Issue Links
- is duplicated by
-
ANY23-98 TikaMIMEtypeDetector doesn't recognize certain file formats when they contain header comments
- Closed