Description
The listenForAllRecords argument is being always reset to 'true', so the 'else' branch is never reached. It may cause incorrect text extraction when records with certain unsupported types (e.g. SharedFormula) are present in a file.
public void processFile(DirectoryNode root, boolean listenForAllRecords) throws IOException, SAXException, TikaException { // Set up listener and register the records we want to process HSSFRequest hssfRequest = new HSSFRequest(); listenForAllRecords = true; if (listenForAllRecords) { hssfRequest.addListenerForAllRecords(formatListener); } else { hssfRequest.addListener(formatListener, BOFRecord.sid); hssfRequest.addListener(formatListener, EOFRecord.sid); hssfRequest.addListener(formatListener, DateWindow1904Record.sid); hssfRequest.addListener(formatListener, CountryRecord.sid); hssfRequest.addListener(formatListener, BoundSheetRecord.sid); hssfRequest.addListener(formatListener, SSTRecord.sid); hssfRequest.addListener(formatListener, FormulaRecord.sid); hssfRequest.addListener(formatListener, LabelRecord.sid); hssfRequest.addListener(formatListener, LabelSSTRecord.sid); hssfRequest.addListener(formatListener, NumberRecord.sid); hssfRequest.addListener(formatListener, RKRecord.sid); hssfRequest.addListener(formatListener, StringRecord.sid); hssfRequest.addListener(formatListener, HyperlinkRecord.sid); hssfRequest.addListener(formatListener, TextObjectRecord.sid); hssfRequest.addListener(formatListener, SeriesTextRecord.sid); hssfRequest.addListener(formatListener, FormatRecord.sid); hssfRequest.addListener(formatListener, ExtendedFormatRecord.sid); hssfRequest.addListener(formatListener, DrawingGroupRecord.sid); if (extractor.officeParserConfig.getIncludeHeadersAndFooters()) { hssfRequest.addListener(formatListener, HeaderRecord.sid); hssfRequest.addListener(formatListener, FooterRecord.sid); } }
Attachments
Issue Links
- links to