PDFBox
  1. PDFBox

Text extraction

Summary

Issues: Unresolved

Key Summary Due Date
Bug PDFBOX-2252 PDFTextStripper has problem with bilingual documents
Bug PDFBOX-448 Columns in text not extracted separately
Bug PDFBOX-495 PDFTextStripperByArea extracts text only from 1 region, despite several regions being defined

View Issues

Issues: Updated recently

Key Summary Updated
Bug PDFBOX-915 some pdf file for chinese can't extracted by correct encode
Bug PDFBOX-2508 Text extraction getting zero font height, bad widths, and ? for text in this PDF with Type 3 Fonts
Bug PDFBOX-2158 ExtractText missing most of text in this PDF file, due to font bounding box with minus infinity

View Issues

Versions: Unreleased

Name Release date
Unreleased 1.8.9  
Unreleased 2.0.0  
Unreleased 2.1.0  
Unreleased 3.0.0