[TIKA-1442] Upgrade to PDFBox 1.8.8 - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.7
Component/s: None
Labels:
None

Description

Given the regressions we identified in PDFBox 1.8.7, we should upgrade to 1.8.8 as soon as it is ready. I'm tempted to call this a blocker on Tika 1.7. Let's use this issue to carry on the discussion of regression testing (if any further discussion is necessary) or any other prep that needs to happen before 1.8.8's release.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

pdfbox_1_8_6V1_8_8-SNAPSHOT.xlsx
16/Oct/14 17:22
2.89 MB
Tilman Hausherr
pdfbox_1_8_6V1_8_8-SNAPSHOTb.xlsx
22/Oct/14 11:18
7.88 MB
Tim Allison
pdfbox_1_8_6V1_8_8-SNAPSHOTc.xlsx
23/Oct/14 01:29
9.66 MB
Tim Allison
pdfbox_1_8_6V1_8_8-SNAPSHOTc.zip
23/Oct/14 21:16
9.57 MB
Tilman Hausherr
PDFBox_1_8_6VPDFBox_1_8_8-b145.xlsx
25/Nov/14 15:31
9.68 MB
Tim Allison
PDFBox_1_8_8-ClassicVPDFBox_1_8_8-NonSeq.xlsx
25/Nov/14 16:00
43 kB
Tim Allison
PDFBox_1_8_6VPDFBox_1_8_8-b145.zip
25/Nov/14 20:46
8.56 MB
Tilman Hausherr
PDFBox_1_8_8-ClassicVPDFBox_1_8_8-NonSeq.xlsx
30/Nov/14 22:48
40 kB
Tilman Hausherr
PDFBox_1_8_6DVPDFBox_1_8_8-TRAD-b156.xlsx
01/Dec/14 21:57
44 kB
Tim Allison
PDFBox_1_8_8-TRADVPDFBox_1_8_8-NONSEQ-b156.xlsx
01/Dec/14 21:57
39 kB
Tim Allison
PDFBox_1_8_6VPDFBox_1_8_8-CLASSIC-b162.xlsx
02/Dec/14 19:38
206 kB
Tim Allison
PDFBox_1_8_8-CLASSICVPDFBox_1_8_8-NONSEQ-b162.xlsx
02/Dec/14 19:38
33 kB
Tim Allison
PDFBox_1_8_8-CLASSICVPDFBox_1_8_8-NONSEQ-b162.xlsx
02/Dec/14 22:49
35 kB
Tilman Hausherr
PDFBox_1_8_6VPDFBox_1_8_8-CLASSIC-b162.xlsx
03/Dec/14 21:09
2.74 MB
Tilman Hausherr

Issue Links

is related to

PDFBOX-2385 inline image with EI at the end incorrectly parsed

Closed

PDFBOX-2421 Poor text extraction and rendering of file with non embedded type1 font

Closed

PDFBOX-2449 Character missing in text extraction

Closed

PDFBOX-2493 OOM with corrupt PDF file

Closed

PDFBOX-2523 IOException: Error: Expected a long type at offset 1218571, instead got 'xref'

Closed

PDFBOX-2533 Poor rendering with non-sequential parser

Closed

PDFBOX-2534 Less pages shown with the non-sequential parser

Closed

PDFBOX-2376 Small regression in text extraction with PDFBox 1.8.7 vs. 1.8.6

Closed

PDFBOX-2377 Apparent regression in character mapping in a few files from govdocs1

Closed

PDFBOX-2527 IOException: Negative seek offset in NonSequentialPDFParser

Closed

PDFBOX-2528 IOException: Object must be defined and must not be compressed object: 0:0

Closed

TIKA-1419 Upgrade to PDFBox 1.8.7

Closed

relates to

TIKA-1467 pdf:encrypted:false with encrypted pdf

Open

(7 is related to, 1 relates to)

Activity

People

Assignee:: Tim Allison

Reporter:: Tim Allison

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 10/Oct/14 12:28

Updated:: 16/Dec/16 16:05

Resolved:: 15/Dec/14 16:18