Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-4864

Different behaviour in extracting "-" in different platform

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Invalid
    • 2.0.19
    • 2.0.19
    • Text extraction
    • None
    • mac OS 11 / windows 7 / ubuntu v20

    Description

      I find that pdfbox have different behaviour in extracting symbol "-" in windows 7 /macOS11 /ubuntu 20

      For attached file 2020032000583_page5.pdf, about line three:

      "Lease liabilities 11,528 –", I can successfully extract the symbol "-", but this is missing in environment macOS and ubuntu

       

      Similar case occur in attached file 2020033001335_page3.pdf, line:

      "Lease liabilities 3,345 –"

      and attached file ltn20190828635_page3.pdf, line:

      "Lease liabilities 3,943 –"

      Do you have any idea about this and any work around?

       

       

       

       

      Attachments

        1. ltn20190828635_page3.pdf
          21 kB
          William Au Yeung
        2. 2020033001335_page3.pdf
          24 kB
          William Au Yeung
        3. 2020032000583_page5.pdf
          20 kB
          William Au Yeung

        Activity

          People

            Unassigned Unassigned
            williamay William Au Yeung
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: