Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-4000

Wrong line break detection for the before ordinal indicator superscripts.

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 2.0.6, 2.0.7, 2.0.8
    • Fix Version/s: None
    • Component/s: Text extraction
    • Labels:
      None
    • Environment:
      Windows 10 64-bit

      Description

      Attached 3 documents have lines similar to THIS AGREEMENT is made as of the 5th day of February, 2016. PdfBox returns this line as 3 separate lines:
      THIS AGREEMENT is made as of the 5
      th
      day of February, 2016.

      You can find each line close to the top of documents.

        Attachments

        1. PDFBOX-4000-reduced.pdf
          2 kB
          Tilman Hausherr
        2. nk7-p19.pdf
          470 kB
          Tilman Hausherr
        3. contract_00968_SEDAR.pdf
          283 kB
          Harun Reşit Zafer
        4. contract_00882_SEDAR.pdf
          332 kB
          Harun Reşit Zafer
        5. contract_00569_SEDAR-marked-1.png
          320 kB
          Tilman Hausherr
        6. contract_00569_SEDAR-experimental.txt
          32 kB
          Tilman Hausherr
        7. contract_00569_SEDAR.pdf
          200 kB
          Harun Reşit Zafer

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              hrzafer Harun Reşit Zafer
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: