[TIKA-1318] Use of Deprecated Word6Extractor.getParagraphText() Method - ASF JIRA

Agile Board

Attach files

Attach Screenshot

Add vote

Voters

Watch issue

Watchers

Create sub-task

Link

Clone

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Minor
Resolution: Unresolved
Affects Version/s: 1.5
Fix Version/s: 1.17, 2.0.0-BETA, 2.1.0
Component/s: parser
Labels:
- deprecation

Description

org.apache.tika.parser.microsoft.WordExtractor.parseWord6() uses the deprecated Word6Extractor.getParagraphText() method. getParagraphText() is supposed to return a String[] with an element for each paragraph in the text. The replacement is getText(), which lets paragraph, cell, etc separation be implementation specific. I'm not sure, at this point, how the POI WordExtractor separates them.