Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.8.10, 1.8.11, 2.0.0
    • Fix Version/s: 1.8.11, 2.0.0
    • Component/s: Text extraction
    • Labels:

      Description

      Text extraction by beads has never worked, or (more likely) has been broken years ago, when/if the code was changed so that text positions are in image coordinates (y=0 is top) and not in PDF coordinates (y=0 is bottom).

      todos:

      • adjust bead rectangles (done)
      • adjust for cropbox (done)
      • separate output from different beads with a newline (will open a different issue if I don't find solution)
      • optimize (done)
      • implement in 1.8.11
      • find a non copyrighted test file (done)

        Attachments

        1. 003422-1.pdf
          138 kB
          Tilman Hausherr
        2. 003422-marked-1.png
          115 kB
          Tilman Hausherr
        3. 003422-1-bad.txt
          5 kB
          Tilman Hausherr
        4. 003422-1-good.txt
          5 kB
          Tilman Hausherr
        5. poems.pdf
          45 kB
          Maruan Sahyoun
        6. poems-marked-1.png
          653 kB
          Tilman Hausherr
        7. poems-marked-2.png
          437 kB
          Tilman Hausherr
        8. PDFBOX-3110-poems-beads-good.txt
          5 kB
          Tilman Hausherr
        9. PDFBOX-3110-poems-beads-bad.txt
          5 kB
          Tilman Hausherr

          Activity

            People

            • Assignee:
              tilman Tilman Hausherr
              Reporter:
              tilman Tilman Hausherr
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: