Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Not A Problem
-
1.15, 1.16, 1.17
-
None
-
None
-
Ubuntu 64 bit
JDK 1.8
Description
I have a PDF file that returns two elements in the recursive json output. The first element is text, as expected. The second element seems to be a fragment of a PDF file, rather than extracted text.
The start of the second element in the json output is:
{
"Content-Encoding": "ISO-8859-1",
"Content-Length": "-1",
"Content-Type": "text/plain; charset\u003dISO-8859-1",
"X-Parsed-By": [
"org.apache.tika.parser.DefaultParser",
"org.apache.tika.parser.txt.TXTParser"
],
"X-TIKA:content": "\u003c\u003c\n /ASCII85EncodePages false\n /AllowTransparency false\n /AutoPositionEPSFiles true\n /AutoRotatePages /None\n /Binding /Left\n /CalGrayProfile (Gray Gamma 2.2)\n /CalRGBProfile (sRGB IEC61966-2.1)\n /CalCMYKProfile (U.S. Web Coated \\050SWOP
051 v2)\n /sRGBProfile (sRGB IEC61966-2.1)\n /CannotEmbedFontPolicy /Warning\n /CompatibilityLevel 1.4\n /CompressObjects /Off\n /CompressPages true\n /ConvertImagesToIndexed true\n /PassThroughJPEGImages true\n /CreateJobTicket false\n /DefaultRenderingIntent /Default\n /DetectBlends true\n /DetectCurves 0.0000\n /ColorConversionStrategy /LeaveColorUnchanged\n /DoThumbnails true\n /EmbedAllFonts true\n /EmbedOpenType false\n /ParseICCProfilesInComments true\n /EmbedJobOptions true\n /DSCReportingLevel 0\n /EmitDSCWarnings false\n /EndPage 1\n /ImageMemory 1048576\n /LockDistillerParams true\n /MaxSubsetPct 100\n /Optimize true\n /OPM 0\n /ParseDSCComments false\n /ParseDSCCommentsForDocInfo false\n /PreserveCopyPage true\n /PreserveDICMYKValues true\n /PreserveEPSInfo false\n /PreserveFlatness true\n /PreserveHalftoneInfo true\n /PreserveOPIComments false\n /PreserveOverprintSettings true\n /StartPage 1\n /SubsetFonts true\n /TransferFunctionInfo /Remove\n /UCRandBGInfo /Preserve\n /UsePrologue false\n /ColorSettingsFile ()\n /AlwaysEmbed [ true\n /AbadiMT-CondensedLight\n /ACaslon-Italic\n /ACaslon