Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-3674

Incorrect ordering of fatha -- potentially indicative of larger issue with RTL

    Details

    • Type: Bug
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Text extraction
    • Labels:

      Description

      On TIKA-2257, Christopher Creutzig shared a file that triggers PDFBox to flip the order of the fatha. I suspect this is happening in normalizeAdd within PDFTextStripper, but I'm not familiar enough with the code to diagnose and fix.

      I confirmed this is still happening in trunk.

      Triggering file and the start of a diagnosis is available on the Tika issue.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                lehmi Andreas Lehmkühler
                Reporter:
                tallison@apache.org Tim Allison
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated: