Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
0.8.0-incubator
-
None
Description
As discussed on the mailing list, the extraction test case seems to fail on non-Windows platforms.
The troublesome test file is ample_fonts_solidconvertor.pdf, and the textextract.log file says the following (^@ is U+0000 and � is U+FFFD):
Lines differ at index expected:46-253 actual:46-65533
FAILURE: Line mismatch for file sample_fonts_solidconvertor.pdf at expected line: 8 at actual line: 8
expected line was: "@V@e^@r^@d^@a^@n^@a^@:@ ^@T@o^@t^@o^@ @j@e^@ @p@o^@k^@u^@s^@n^@ý^@ @t@e^@x^@t^@ @s@ ^A"
actual line was: "@V@e^@r^@d^@a^@n^@a^@:@ ^@T@o^@t^@o^@ @j@e^@ @p@o^@k^@u^@s^@n^@�@ ^@t@e^@x^@t^@ @s@ ^A"
Lines differ at index expected:4-253 actual:4-65533
FAILURE: Line mismatch for file sample_fonts_solidconvertor.pdf at expected line: 10 at actual line: 10
expected line was: "AY^A~@ý^@á^@í^@é"
actual line was: "AY^A~@�@�@�^@�"
Lines differ at index expected:52-253 actual:52-65533
FAILURE: Line mismatch for file sample_fonts_solidconvertor.pdf at expected line: 11 at actual line: 11
expected line was: "@S@a^@n^@s^@ @s@e^@r^@i^@f^@:@ ^@T@o^@t^@o^@ @j@e^@ @p@o^@k^@u^@s^@n^@ý^@ @t@e^@x^@t^@ @s@ ^A"
actual line was: "@S@a^@n^@s^@ @s@e^@r^@i^@f^@:@ ^@T@o^@t^@o^@ @j@e^@ @p@o^@k^@u^@s^@n^@�@ ^@t@e^@x^@t^@ @s@ ^A"
Lines differ at index expected:4-253 actual:4-65533
FAILURE: Line mismatch for file sample_fonts_solidconvertor.pdf at expected line: 13 at actual line: 13
expected line was: "AY^A~@ý^@á^@í^@é"
actual line was: "AY^A~@�@�@�^@�"
Preparing to parse sample_fonts_solidconvertor.pdf for sorted test
Lines differ at index expected:46-253 actual:46-65533
FAILURE: Line mismatch for file sample_fonts_solidconvertor.pdf at expected line: 8 at actual line: 8
expected line was: "@V@e^@r^@d^@a^@n^@a^@:@ ^@T@o^@t^@o^@ @j@e^@ @p@o^@k^@u^@s^@n^@ý^@ @t@e^@x^@t^@ @s@ ^A"
actual line was: "@V@e^@r^@d^@a^@n^@a^@:@ ^@T@o^@t^@o^@ @j@e^@ @p@o^@k^@u^@s^@n^@�@ ^@t@e^@x^@t^@ @s@ ^A"
Lines differ at index expected:0-253 actual:0-65533
FAILURE: Line mismatch for file sample_fonts_solidconvertor.pdf at expected line: 10 at actual line: 10
expected line was: "@ý@á^@í^@é"
actual line was: "@�@�@�@�"
Lines differ at index expected:52-253 actual:52-65533
FAILURE: Line mismatch for file sample_fonts_solidconvertor.pdf at expected line: 11 at actual line: 11
expected line was: "@S@a^@n^@s^@ @s@e^@r^@i^@f^@:@ ^@T@o^@t^@o^@ @j@e^@ @p@o^@k^@u^@s^@n^@ý^@ @t@e^@x^@t^@ @s@ ^A"
actual line was: "@S@a^@n^@s^@ @s@e^@r^@i^@f^@:@ ^@T@o^@t^@o^@ @j@e^@ @p@o^@k^@u^@s^@n^@�@ ^@t@e^@x^@t^@ @s@ ^A"
Lines differ at index expected:4-253 actual:4-65533
FAILURE: Line mismatch for file sample_fonts_solidconvertor.pdf at expected line: 13 at actual line: 13
expected line was: "A~^AY@ý^@á^@í^@é"
actual line was: "A~^AY@�@�@�^@�"