Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Not A Bug
-
2.0.26
-
None
-
None
-
Windows 11 + Intellij + Spark3.12 + scala2.12
-
Patch
Description
With pdfbox version 2.0.6
following code get the text extracted from the pdf file which attached in Attachment:
def getTextFromPdf(filename: String):Some[String] = {
var textContent :Some[String]= null
try {
val doc :PDDocument = PDDocument.load(new File(filename))
val docInfo :PDDocumentInformation = doc.getDocumentInformation();
val stripper = new PDFTextStripper
stripper.setStartPage(1)
stripper.setEndPage(1)
textContent = Some(stripper.getText(doc))
Output:
...........
- (1) Written Premium Collected by the Bank 0.00US$ 0.00US$ 0.00US$ 0.00US$ 0.00US$ 0.00US$
(2) Increase (Decrease) in Uearned Premium Reserve 0.00US$ (72.04)US$ (72.04)US$ 0.00US$ (272.31)US${color} (272.31)US$
(3) Earned Premium ((Reinsurance Premium) (1)- (2)) 0.00US$ 72.04US$ 72.04US$ 0.00US$ 272.31US$ 272.31US$
(4) Currency Tax (Impuesto Divisas) [2% of (3)] 0.00US$ 1.44US$ 1.44US$ 0.00US$ 5.45US$ 5.45US$
(5) Ceding Allowance [5.8% of (3)] $ 0.00 0.00US$ 4.18US$ 4.18US$ 0.00US$ 15.79US$ 15.79US$
.........
Expect: All the money field should be in correct order, like:
- Written Premium Collected by the Bank US$ 0.00 US$0.00 US$0.00 US$0.00 US$0.00 US$0.00