Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-756

Some characters from TeX-created files are mapped into ASCII range 1-31

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 1.2.0, 2.0.19, 3.0.0 PDFBox
    • 2.0.20, 3.0.0 PDFBox
    • Text extraction
    • None
    • Mac OS X 10.6.4

    Description

      For some TeX-created files, some characters are mapped to low ASCII values. Example:
      fx 2y − fx − 2y
      instead of
      (x + 2y) - f(x − 2y) =
      With the non-printable characters denote by \xN, PDFBox's result is
      f\x3x\x4 2y\x5 − f\x3x − 2y\x5 \x6
      This probably cannot be fixed, since in another file the same numbers represent different characters:
      Za

      {a, a 1, . . .}

      instead of
      Z(a) =

      {a, a + 1,...}

      (Z\x4a\x5 \x6

      {a, a \x7 1, . . .}

      )
      in another file.

      Attachments

        1. 826130.pdf
          552 kB
          Thomas Fischer
        2. 826130.txt
          32 kB
          Thomas Fischer

        Activity

          People

            lehmi Andreas Lehmkühler
            thomas_gb Thomas Fischer
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: