Version 1.17

RELEASED

Start date not set

Released: 13/Dec/17

Release Notes

Apache Tika 1.17Show more
PTKeySummaryAssigneeStatus
BlockerBugTIKA-2034Upgrade XMPCore to 5.1.3Tim AllisonResolved
BlockerTaskTIKA-2486Upgrade metadata-extractor to 2.10.1UnassignedResolved
BlockerBugTIKA-2499Sonatype Nexus Auditor is reporting that Tika 1.13 is using a number of vulnerable Third party components.Tim AllisonResolved
BlockerBugTIKA-2519Issue parsing multiple CHM files concurrentlyUnassignedResolved
CriticalSub-taskTIKA-1607Introduce new arbitrary object key/values data structure for persistence of Tika MetadataLewis John McGibbneyOpen
CriticalBugTIKA-1829org.apache.tika.parser.ocr.TesseractOCRParser.getSupportedTypes(TesseractOCRParser.java:92) NPE Tim AllisonOpen
CriticalBugTIKA-2447PSDParser creates unnecessary large byte array and discards itUnassignedResolved
MajorBugTIKA-715Some parsers produce non-well-formed XHTML SAX eventsUnassignedOpen
MajorNew FeatureTIKA-774ExifTool ParserChris A. MattmannOpen
MajorNew FeatureTIKA-776ExifTool EmbedderChris A. MattmannOpen
MajorNew FeatureTIKA-819Make Option to Exclude Embedded Files' Text for Text ContentUnassignedOpen
MajorImprovementTIKA-894Add webapp mode for Tika Server, simplifies deploymentUnassignedOpen
MajorNew FeatureTIKA-980MicrodataContentHandler for Apache TikaKenneth William KruglerOpen
MajorImprovementTIKA-985Support for HTML5 elementsUnassignedOpen
MajorBugTIKA-987Embedded drawing (SHAPE MERGEFORMAT) sometimes not extractedUnassignedOpen
MajorImprovementTIKA-988We don't extract a placeholder for a Word document embedded in an Excel documentUnassignedOpen
MajorImprovementTIKA-1059Better Handling of InterruptedException in ExternalParser and ExternalEmbedderUnassignedOpen
MajorImprovementTIKA-1108Represent individual slides in pptxUnassignedOpen
MajorSub-taskTIKA-1208Migrate Any23 mime contributions to TikaUnassignedOpen
MajorBugTIKA-1276Missing embedded dependencies in tika-bundleUnassignedReopened
MajorImprovementTIKA-1308Support in memory parse mode(don't create temp file): to support run Tika in GAEUnassignedOpen
MajorNew FeatureTIKA-1328Translate Metadata and ContentUnassignedOpen
MajorImprovementTIKA-1367Tika documentation should list tika-parsers parser dependenciesUnassignedResolved
MajorBugTIKA-1379error in Tika().detect for xml files with xades signatureUnassignedOpen
MajorBugTIKA-1390Create tika-example moduleUnassignedOpen
MajorImprovementTIKA-1425Automatic batching of Microsoft service callsLewis John McGibbneyOpen
MajorBugTIKA-1454Extracting as HTML loses links in xlsx, ppt, and pptx filesTim AllisonResolved
MajorBugTIKA-1456Visual Sentiment API parserChris A. MattmannOpen
MajorImprovementTIKA-1465Implement extraction of non-global variables from netCDF3 and netCDF4Lewis John McGibbneyOpen
MajorBugTIKA-1505chmparser breaks down when extracting from file of CHM format v3UnassignedOpen
MajorNew FeatureTIKA-1518Docker with Tika ServerDave MeikleReopened
MajorNew FeatureTIKA-1540New Tika plugin for image based feature extraction using computer vision techniquesLewis John McGibbneyOpen
MajorImprovementTIKA-1577NetCDF Data ExtractionAnn BurgessOpen
MajorNew FeatureTIKA-1598Parser Implementation for Streaming VideoLewis John McGibbneyOpen
MajorNew FeatureTIKA-1609Leverage Google's LibPhonenumber for enhanced phone number extraction and metadata modelingLewis John McGibbneyOpen
MajorNew FeatureTIKA-1616Tika Parser for GIBS MetadataLewis John McGibbneyOpen
MajorImprovementTIKA-1640Make ExternalParser support aliases for key names in extracted metadataChris A. MattmannOpen
MajorImprovementTIKA-1672Integrate tika-java7 componentUnassignedOpen
MajorNew FeatureTIKA-1697Parser Implementation for AkomaNtoso Legal XML DocumentsLewis John McGibbneyOpen
MajorTaskTIKA-1705Update ASM dependency to 5.0.4Dave MeikleReopened
MajorNew FeatureTIKA-1724Create parser for .obo file format.Lewis John McGibbneyOpen
MajorBugTIKA-1788message/rfc822 parser doesn't identify attachment filenames from Content-Disposition headerTim AllisonResolved
MajorBugTIKA-1800MediaType#parse does not decode escaped special charactersUnassignedOpen
MajorImprovementTIKA-1840No way to link slide notes to slide in PPT output.Chris A. MattmannReopened
MajorBugTIKA-1952Access Date is getting modified while capturing the MetaData information using AutoDetectParserUnassignedOpen
MajorBugTIKA-1953tika-server NullPointerException while processing rtfsTim AllisonResolved
MajorNew FeatureTIKA-1988Age Detection Tika RecogniserChris A. MattmannReopened
MajorImprovementTIKA-2262Supporting Image-to-Text (Image Captioning) in Tika for Image MIME TypesChris A. MattmannResolved
MajorNew FeatureTIKA-2332Output SNOMED codes for CUIs in CTAKES output?Chris A. MattmannResolved
MajorImprovementTIKA-2340Add explicit deps to tika-parsers which are currently used from transitive scopeKonstantin GribovOpen
MajorImprovementTIKA-2346Allow Office format parsers to exclude parsing shapesUnassignedReopened
MajorBugTIKA-2347Underlined text is not decorated as such when extracting from word documentsDave MeikleClosed
MajorImprovementTIKA-2355Cache trained mode while running ObjectRecognition server from Docker builds Chris A. MattmannResolved
MajorBugTIKA-2369Define a clean Recogniser interface: for objects from binary data; and for text classificationChris A. MattmannOpen
MajorBugTIKA-2385Tesseract OCR rotation.py not runDave MeikleResolved
MajorWishTIKA-2389Warn log level is pretty strong for missing JBIG2ImageReaderUnassignedResolved
MajorBugTIKA-2428EMFParser loops forever with corrupted filesUnassignedResolved
MajorImprovementTIKA-2429Upgrade to POI 3.17-final when availableUnassignedResolved
MajorImprovementTIKA-2430Add at least dev test capability to run Tika against fuzzed filesTim AllisonResolved
MajorBugTIKA-2433Tika 1.16 - Nullpointer Exception after update - Asking for helpUnassignedResolved
MajorBugTIKA-2435docx parser missing content when multiple body sectionsUnassignedResolved
MajorBugTIKA-2438Test failure at OOXMLParserTest.testBigIntegersWGeneralFormat:1350->TikaTest.assertContains:102UnassignedResolved
MajorImprovementTIKA-2439Avoid NullPointerException in org.apache.tika.langdetect.OptimaizeLangDetector if models haven't been loadedUnassignedResolved
MajorBugTIKA-2442Non-terminal interactive form fields not handled recursivelyUnassignedResolved
MajorBugTIKA-2445Windows BAT / CMD detectionUnassignedResolved
MajorImprovementTIKA-2449Enabling extraction of standard references from textGiuseppe TotaroResolved
MajorBugTIKA-2454Emails extracted from PSTs detected as unexpected file typesUnassignedResolved
MajorBugTIKA-2456Emails extracted from MBOX not detected as rfc822UnassignedResolved
MajorBugTIKA-2459Missing text in .doc file (but can be extracted by POI)UnassignedResolved
MajorImprovementTIKA-2466Remove JAXB usageUnassignedResolved
MajorBugTIKA-2470Illegal reflective Access -- more cleanup for Java 9UnassignedResolved
MajorBugTIKA-2483Using PackageParser in ForkParser causes NPEUnassignedResolved
MajorBugTIKA-2497Unexpected RuntimeException from org.apache.tika.parser.microsoft.ooxml.OOXMLParserUnassignedResolved
MajorSub-taskTIKA-2502Upgrade OpenNLP to 1.8.3UnassignedResolved
MajorSub-taskTIKA-2503Try to upgrade httpclient to >=4.5.3Tim AllisonResolved
MajorSub-taskTIKA-2504Upgrade or remove plexus-utilsUnassignedResolved
MajorBugTIKA-2506Nullpointer in tika-dl test on windowsBob PaulinResolved
MajorBugTIKA-2511Slowness parsing SQLite database fileUnassignedResolved
MajorTaskTIKA-2516Upgrade CFX version to > 3.0.13UnassignedResolved
MajorBugTIKA-2723Issue with parsing .mht containerUnassignedOpen
MajorImprovementTIKA-3029to extract information from ppt formats along with tables and image contentUnassignedOpen
MinorImprovementTIKA-539Encoding detection is too biased by encoding in meta tagKenneth William KruglerReopened
MinorNew FeatureTIKA-1220Parser implementration for IFC filesLewis John McGibbneyOpen
MinorBugTIKA-1295Make some Dublin Core items multi-valuedTim AllisonOpen
MinorBugTIKA-1318Use of Deprecated Word6Extractor.getParagraphText() MethodUnassignedOpen
MinorSub-taskTIKA-1329Add RecursiveParserWrapper aka Jukka's (and Nick's) RecursiveMetadataParserUnassignedReopened
MinorImprovementTIKA-1366Update some of Tika Server services to support JAX-RS 2.0 AsyncResponse UnassignedOpen
MinorSub-taskTIKA-1395Create embedded image extraction exampleUnassignedOpen
MinorImprovementTIKA-1417Create Extract Embedded Images from PDFs ExampleUnassignedOpen
MinorNew FeatureTIKA-1674Add example to show how to extract embedded filesUnassignedOpen
MinorImprovementTIKA-1688Tika Version in MetadataUnassignedOpen
MinorBugTIKA-1738ForkClient does not always delete temporary bootstrap jarUnassignedOpen
MinorSub-taskTIKA-2400Standardizing current Object Recognition REST parsersChris A. MattmannResolved
MinorSub-taskTIKA-2402Support all image formats in Object Recognition REST ParserChris A. MattmannResolved
MinorImprovementTIKA-2431Upgrade to PDFBox 2.0.7Tim AllisonResolved
MinorImprovementTIKA-2440Phonetic strings handling for multilingual environments.UnassignedResolved
MinorImprovementTIKA-2448Handle phonetic strings in the SAX docx parserUnassignedResolved
MinorBugTIKA-2450OfficeParser.parse called for zero-byte file with .doc extensionUnassignedResolved
MinorImprovementTIKA-2451Detect image frame counts for tiff filesUnassignedResolved
MinorImprovementTIKA-2455Flag in metadata for alternative email bodiesUnassignedResolved
MinorBugTIKA-2464No PIL found while running the docker image 'InceptionVideoRestDockerfile'Chris A. MattmannResolved
MinorBugTIKA-2469False positives with x-ms-owner detectionTim AllisonResolved
MinorBugTIKA-2478RFC822 includes redundant copies of the textTim AllisonResolved
MinorImprovementTIKA-2485EncodingDetectors markLimits to be configurableTim AllisonResolved
MinorImprovementTIKA-2492Remove pdfdebugger from tikaUnassignedClosed
MinorBugTIKA-2510Embedded MP3 file in PPTX document no longer identifiedTim AllisonResolved
MinorImprovementTIKA-2512Add underline and strikethrough to SAX-based docx/pptx parsersUnassignedResolved
TrivialImprovementTIKA-891Use POST in addition to PUT on method calls in tika-serverChris A. MattmannOpen
TrivialImprovementTIKA-2312[Mp3Parser] expose fields form ID3TagsAndAudio UnassignedOpen
TrivialBugTIKA-2426Fix locale-dependent test in xlsb unit testUnassignedResolved
TrivialImprovementTIKA-2465Add explicit unit tests for xxeUnassignedResolved
TrivialImprovementTIKA-2472Implement Metadata.hashCodeSergey BeryozkinResolved
TrivialImprovementTIKA-2476Metadata.toString always returns a trailing spaceSergey BeryozkinResolved
TrivialImprovementTIKA-2489Upgrade to PDFBox 2.0.8UnassignedResolved
TrivialBugTIKA-2490Turn off stderr warnings in Tika-appTim AllisonResolved
TrivialBugTIKA-2491Cannot use TikaConfigUnassignedResolved
TrivialSub-taskTIKA-2501Upgrade jackson to 2.9.2UnassignedResolved
TrivialBugTIKA-2521SAX-based docx/pptx should start a new line before second paragraph within a cellUnassignedResolved
1118 of 118