Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-4000

Wrong line break detection for the before ordinal indicator superscripts.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.0.6, 2.0.7, 2.0.8
    • None
    • Text extraction
    • None
    • Windows 10 64-bit

    Description

      Attached 3 documents have lines similar to THIS AGREEMENT is made as of the 5th day of February, 2016. PdfBox returns this line as 3 separate lines:
      THIS AGREEMENT is made as of the 5
      th
      day of February, 2016.

      You can find each line close to the top of documents.

      Attachments

        1. PDFBOX-4000-reduced.pdf
          2 kB
          Tilman Hausherr
        2. nk7-p19.pdf
          470 kB
          Tilman Hausherr
        3. contract_00968_SEDAR.pdf
          283 kB
          Harun Reşit Zafer
        4. contract_00882_SEDAR.pdf
          332 kB
          Harun Reşit Zafer
        5. contract_00569_SEDAR-marked-1.png
          320 kB
          Tilman Hausherr
        6. contract_00569_SEDAR-experimental.txt
          32 kB
          Tilman Hausherr
        7. contract_00569_SEDAR.pdf
          200 kB
          Harun Reşit Zafer

        Activity

          People

            Unassigned Unassigned
            hrzafer Harun Reşit Zafer
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: