Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
2.0.0, 2.0.1, 2.0.2, 2.0.3, 3.0.0 PDFBox
-
None
Description
The ability to extract a line of text as it appears in the PDF is no longer working in the 2.x version of pdfbox.
java -jar pdfbox-app-1.8.4.jar ExtractText -console -sort ~/Desktop/text-extraction-issues.pdf
results in:
. . . Your Code Our Code Description Qty Price Ex Total Ex 11SP 100129630 IRWIN VICE-GRIP 11 C-CLAMP SWIVEL PAD 4 00.00 000.00 IR-0352 100094584 IRWIN 600MM TOOL BAG 1 00.00 00.00 EM81.9 100088913 EMPIRE TORPEDO LEVEL ALUMINIUM 1 00.00 00.00 20566-618R 100023443 LENOX RECIPRO BLADE 150X20X0.9MM 18TPI 5P 3 0.00 00.00 . . .
while
java -jar pdfbox-app-2.0.2.jar ExtractText -console -sort ~/Desktop/text-extraction-issues.pdf
results in:
. . . Your Code Our Code Description Qty Price Ex Total Ex IRWIN VICE-GRIP 11 C-CLAMP SWIVEL PAD 11SP 100129630 4 00.00 000.00 IRWIN 600MM TOOL BAG IR-0352 100094584 1 00.00 00.00 EMPIRE TORPEDO LEVEL ALUMINIUM EM81.9 100088913 1 00.00 00.00 LENOX RECIPRO BLADE 150X20X0.9MM 18TPI 5P 20566-618R 100023443 3 0.00 00.00 . . .