Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-3986

Bounding box of mathematical symbols are not proper

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Not A Bug
    • Affects Version/s: 2.0.7
    • Fix Version/s: None
    • Component/s: Text extraction
    • Labels:
      None
    • Environment:
      Windows 7 (64 bit)
    • Flags:
      Important

      Description

      Hello Support Team,

      I am working on a task where I have to extract formulas from PDF document and convert them into images.

      But when I extract them using PDFBox, some of the symbols like Summation, Integral, or Big Parenthesis .etc are mixing up with its previous line.

      I checked the output of DrawPrintTextLocations example with that particular PDF document and result does not look normal.
      Red boxes are not aligned properly in the output as you will see in the attachment files.

      I am, herewith, attaching the output of two pages and PDF document itself.

      Please refer page no. 34 or 37 for this issue.

      Thank you in advance!

        Attachments

        1. PDFBOX-3986-reduced.pdf
          11 kB
          Tilman Hausherr
        2. formula.pdf
          1017 kB
          Navnath Kumbhar
        3. formula-marked-37.png
          351 kB
          Navnath Kumbhar
        4. formula-marked-34.png
          309 kB
          Navnath Kumbhar

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              Navnath@3DS Navnath Kumbhar
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: