Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-3986

Bounding box of mathematical symbols are not proper

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Not A Bug
    • 2.0.7
    • None
    • Text extraction
    • None
    • Windows 7 (64 bit)
    • Important

    Description

      Hello Support Team,

      I am working on a task where I have to extract formulas from PDF document and convert them into images.

      But when I extract them using PDFBox, some of the symbols like Summation, Integral, or Big Parenthesis .etc are mixing up with its previous line.

      I checked the output of DrawPrintTextLocations example with that particular PDF document and result does not look normal.
      Red boxes are not aligned properly in the output as you will see in the attachment files.

      I am, herewith, attaching the output of two pages and PDF document itself.

      Please refer page no. 34 or 37 for this issue.

      Thank you in advance!

      Attachments

        1. PDFBOX-3986-reduced.pdf
          11 kB
          Tilman Hausherr
        2. formula-marked-37.png
          351 kB
          Navnath Kumbhar
        3. formula-marked-34.png
          309 kB
          Navnath Kumbhar
        4. formula.pdf
          1017 kB
          Navnath Kumbhar

        Activity

          People

            Unassigned Unassigned
            Navnath@3DS Navnath Kumbhar
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: