Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-3674

Incorrect ordering of fatha -- potentially indicative of larger issue with RTL

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • Text extraction

    Description

      On TIKA-2257, ccreutzig shared a file that triggers PDFBox to flip the order of the fatha. I suspect this is happening in normalizeAdd within PDFTextStripper, but I'm not familiar enough with the code to diagnose and fix.

      I confirmed this is still happening in trunk.

      Triggering file and the start of a diagnosis is available on the Tika issue.

      Attachments

        1. PDFBOX-3674-reduced.pdf
          19 kB
          Tilman Hausherr

        Issue Links

          Activity

            People

              lehmi Andreas Lehmkühler
              tallison Tim Allison
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: