Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.24.1
-
None
-
Windows 10 x64
OpenJDK 14
Description
In default mode RFC822Parser seems to ignore charset defined in headers when detect content. When I set "extractAllAlternatives " to false then content seems fine.
Test case:
@Test public void testQuotedPrintableCharset() { Metadata metadata = new Metadata(); InputStream stream = getStream("test-documents/testRFC822_quoted_charset_iso_8859_2"); ContentHandler handler = new BodyContentHandler(); ParseContext context = new ParseContext(); try { RFC822Parser emailparser = new RFC822Parser(); emailparser.setExtractAllAlternatives(true); emailparser.parse(stream, handler, metadata, context); String bodyText = handler.toString(); assertTrue(bodyText.contains("Dzie\u0144 dobry.")); } catch (Exception e) { fail("Exception thrown: " + e.getMessage()); } }