Public signup for this instance is disabled. Go to our Self serve sign up page to request an account. Report potential security issues privately
Looks like the encoding detection heuristics need some adjustment.
TIKA-868 TXT parser does not honour the specified encoding
TIKA-2771 enableInputFilter() wrecks charset detection for some short html documents
TIKA-2047 TXTParser overwrites mime type/masks types that are subtype of text