Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
Using the attached docx file, when I parse it with
/unpack
Endpoint I get _TEXT_ file that contains my this:
[[bookmark: _GoBack]Launching ms word Sadfsadfsaf Asdfsafsafasfsafd Asdf2 Asfd3 asfd
But when I parse it with /rmeta/text I get a X-TIKA:content field that contains:
Launching ms word Sadfsadfsaf Asdfsafsafasfsafd Asdf2 Asfd3 asfd
Why do these differ? Seems like there a bunch of leading \n characters to start out on the /rmeta/text endpoint? And there is this strange [[bookmark: _GoBack] that I wasn't expecting too. Not sure what that means. Perhaps they are just fundamentally different outputs and this is normal behavior?