Details
-
Improvement
-
Status: Resolved
-
Trivial
-
Resolution: Won't Fix
-
1.18
-
None
-
MacOS Sierra 10.12.6
Description
When text from a table is extracted, sometimes the order of the cells becomes mixed and the words get concatenated together. For example:
HOURS | DUR (hr) |
PHASE | CODE | SUB | DESCRIPTION |
---|
becomes: Hours Dur Code Sub DescriptionPhase
In other more serious cases, the text within a cell becomes scrambled with a text from another cell. Such as:
HOURS | DUR (hr) |
PHASE | CODE | SUB |
---|---|---|---|---|
00:00 - 17:00 | 17.00 | FLOWBK | 33 P - FLOWBACK / TESTING |
E - RIG OUT TESTERS |
the second row becomes:
17.00-00:00 17:00 FLOWBK E - RIG OUT
TESTERS
33 P -
FLOWBACK /
TESTING
Note that the value of the second column has moved to the first column, and the "-" within the first column is misordered. The last two columns have switched places.
Attachments
Issue Links
- is caused by
-
TIKA-2249 Tika not able to parse tables from pdf
- Open