Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-568

testextract failure on Linux and Mac OS X

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.8.0-incubator
    • 1.3.1
    • Text extraction
    • None

    Description

      As discussed on the mailing list, the extraction test case seems to fail on non-Windows platforms.

      The troublesome test file is ample_fonts_solidconvertor.pdf, and the textextract.log file says the following (^@ is U+0000 and � is U+FFFD):

      Lines differ at index expected:46-253 actual:46-65533
      FAILURE: Line mismatch for file sample_fonts_solidconvertor.pdf at expected line: 8 at actual line: 8
      expected line was: "@V@e^@r^@d^@a^@n^@a^@:@ ^@T@o^@t^@o^@ @j@e^@ @p@o^@k^@u^@s^@n^@ý^@ @t@e^@x^@t^@ @s@ ^A"
      actual line was: "@V@e^@r^@d^@a^@n^@a^@:@ ^@T@o^@t^@o^@ @j@e^@ @p@o^@k^@u^@s^@n^@�@ ^@t@e^@x^@t^@ @s@ ^A"
      Lines differ at index expected:4-253 actual:4-65533
      FAILURE: Line mismatch for file sample_fonts_solidconvertor.pdf at expected line: 10 at actual line: 10
      expected line was: "AY^A~@ý^@á^@í^@é"
      actual line was: "AY^A~@�@�@�^@�"
      Lines differ at index expected:52-253 actual:52-65533
      FAILURE: Line mismatch for file sample_fonts_solidconvertor.pdf at expected line: 11 at actual line: 11
      expected line was: "@S@a^@n^@s^@ @s@e^@r^@i^@f^@:@ ^@T@o^@t^@o^@ @j@e^@ @p@o^@k^@u^@s^@n^@ý^@ @t@e^@x^@t^@ @s@ ^A"
      actual line was: "@S@a^@n^@s^@ @s@e^@r^@i^@f^@:@ ^@T@o^@t^@o^@ @j@e^@ @p@o^@k^@u^@s^@n^@�@ ^@t@e^@x^@t^@ @s@ ^A"
      Lines differ at index expected:4-253 actual:4-65533
      FAILURE: Line mismatch for file sample_fonts_solidconvertor.pdf at expected line: 13 at actual line: 13
      expected line was: "AY^A~@ý^@á^@í^@é"
      actual line was: "AY^A~@�@�@�^@�"
      Preparing to parse sample_fonts_solidconvertor.pdf for sorted test
      Lines differ at index expected:46-253 actual:46-65533
      FAILURE: Line mismatch for file sample_fonts_solidconvertor.pdf at expected line: 8 at actual line: 8
      expected line was: "@V@e^@r^@d^@a^@n^@a^@:@ ^@T@o^@t^@o^@ @j@e^@ @p@o^@k^@u^@s^@n^@ý^@ @t@e^@x^@t^@ @s@ ^A"
      actual line was: "@V@e^@r^@d^@a^@n^@a^@:@ ^@T@o^@t^@o^@ @j@e^@ @p@o^@k^@u^@s^@n^@�@ ^@t@e^@x^@t^@ @s@ ^A"
      Lines differ at index expected:0-253 actual:0-65533
      FAILURE: Line mismatch for file sample_fonts_solidconvertor.pdf at expected line: 10 at actual line: 10
      expected line was: "@á^@í^@é"
      actual line was: "@�@�@�@�"
      Lines differ at index expected:52-253 actual:52-65533
      FAILURE: Line mismatch for file sample_fonts_solidconvertor.pdf at expected line: 11 at actual line: 11
      expected line was: "@S@a^@n^@s^@ @s@e^@r^@i^@f^@:@ ^@T@o^@t^@o^@ @j@e^@ @p@o^@k^@u^@s^@n^@ý^@ @t@e^@x^@t^@ @s@ ^A"
      actual line was: "@S@a^@n^@s^@ @s@e^@r^@i^@f^@:@ ^@T@o^@t^@o^@ @j@e^@ @p@o^@k^@u^@s^@n^@�@ ^@t@e^@x^@t^@ @s@ ^A"
      Lines differ at index expected:4-253 actual:4-65533
      FAILURE: Line mismatch for file sample_fonts_solidconvertor.pdf at expected line: 13 at actual line: 13
      expected line was: "A~^AY@ý^@á^@í^@é"
      actual line was: "A~^AY@�@�@�^@�"

      Attachments

        Activity

          People

            Unassigned Unassigned
            jukkaz Jukka Zitting
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: