Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
1.8.10, 1.8.11, 2.0.0
-
None
-
All
Description
Hi,
I am parsing a very complicated PDF, for which I had to enable (setSortByPosition as true), otherwise the Parser is not able to do sequential text extraction.
So I decided to use PDFTextStripperByArea class, and then make rectangles to extract text. But problem here is that If I make many rectangles in a single page, again there is no logical sequence of text extracted, So to get around this it will be awesome to have a method to remove regions, then we can add a region extract text, remove that region , then again add new region and so on....
I have already done a POC in my local computer and it works fine. added this method and tested.
public void removeRegion(String regionName) {
this.regions.remove(regionName);
this.regionArea.remove(regionName);
}
I can contribute this code myself, if you suggest, let me know, thanks and regards
Praveer