Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-4864

Different behaviour in extracting "-" in different platform

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Invalid
    • Affects Version/s: 2.0.19
    • Fix Version/s: 2.0.19
    • Component/s: Text extraction
    • Labels:
      None
    • Environment:
      mac OS 11 / windows 7 / ubuntu v20

      Description

      I find that pdfbox have different behaviour in extracting symbol "-" in windows 7 /macOS11 /ubuntu 20

      For attached file 2020032000583_page5.pdf, about line three:

      "Lease liabilities 11,528 –", I can successfully extract the symbol "-", but this is missing in environment macOS and ubuntu

       

      Similar case occur in attached file 2020033001335_page3.pdf, line:

      "Lease liabilities 3,345 –"

      and attached file ltn20190828635_page3.pdf, line:

      "Lease liabilities 3,943 –"

      Do you have any idea about this and any work around?

       

       

       

       

        Attachments

        1. ltn20190828635_page3.pdf
          21 kB
          William Au Yeung
        2. 2020033001335_page3.pdf
          24 kB
          William Au Yeung
        3. 2020032000583_page5.pdf
          20 kB
          William Au Yeung

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              williamay William Au Yeung
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: