I dug into this one some more.
Handling space between words is tricky in PDF! This is because a PDF
need not actually include space characters; instead it can (and does!)
simply place the glyphs at x/y positions with added whitespace between
them. This easily happens for white-space based languages too.
Yet, sometimes PDFs do include space characters themselves (the attached
PDF is such an example). Ideally we would be able to somehow detect
this (eg if the PDF is encoded differently internally something) but
I don't know how to do this / if it's even possible.
So for the time being I made a simple addition to PDFParser, adding an
option set/getEnableAutoSpace, defaulting to enabled (ie keeping the
behavior today). So at least if an app hits PDFs like the one
attached here, or somehow they know their PDFs always include explicit
space characters, they can set this option.