All Projects : Tika (Key: TIKA)

Project Lead: Jukka Zitting
URL: http://lucene.apache.org/tika/
Description:
Apache Tika issues

Release Notes

 Select:   Open Issues   Road Map   Change Log   Popular Issues   Subversion Commits   Releases   Versions   Components   

Change Log

Apache Tika 0.5
   Bug TIKA-268 FIXED HTMLParser ommits necessary space-characters when parsing table-data Critical Resolved
   Bug TIKA-267 FIXED encrypted pdf files aren't handled properly Critical Resolved
   New Feature TIKA-320 FIXED Allow disabling language detection in AutoDetectParser Major Resolved
   Bug TIKA-274 FIXED CharsetDetector.setDeclaredEncoding has no effect Major Resolved
   Bug TIKA-273 FIXED Content encoding in HtmlParser Major Resolved
   New Feature TIKA-269 FIXED Ease of use -facade for Tika Major Resolved
   Bug TIKA-266 FIXED Empty tika-core jar Major Resolved
   Bug TIKA-319 FIXED HtmlParser - use encoding hint only if charset is supported Major Resolved
   Bug TIKA-304 FIXED HtmlParser could be easier to subclass Major Resolved
   Improvement TIKA-287 FIXED HtmlParser should resolve relative paths in <a href="xxx"> elements Major Resolved
   Improvement TIKA-314 FIXED Initial support for JPEG EXIF metadata extraction Major Resolved
   Bug TIKA-209 FIXED Language detection is weak. Major Resolved
   Bug TIKA-256 FIXED MSWord parser does not extract footnotes and comments Major Resolved
   New Feature TIKA-275 FIXED Parse context Major Resolved
   Bug TIKA-262 FIXED ParsingReader does not parse metadata for larger MS Office documents Major Resolved
   New Feature TIKA-295 FIXED Rough cut of mbox parser Major Resolved
   Improvement TIKA-277 FIXED Tika stand alone CLI --possibility to specify output encoding (--text) Major Resolved
   Bug TIKA-294 FIXED TikaCLI always uses System.in for input Major Resolved
   Bug TIKA-312 FIXED TikaCLI can't print metadata Major Resolved
   Improvement TIKA-285 FIXED Update media type registry to the latest httpd mime type database Major Resolved
   Improvement TIKA-284 FIXED Upgrade to POI 3.5-FINAL Major Resolved
   Improvement TIKA-310 FIXED Use TagSoup to parse HTML Major Resolved
   Improvement TIKA-281 FIXED Use repository.apache.org to deploy snapshots and releases Major Resolved
   Bug TIKA-305 FIXED XHTML href attributes end up in the wrong namespace Major Resolved
   Bug TIKA-293 FIXED XWPFWordExtractorDecorator does not extract bookmarks Major Resolved
   Bug TIKA-283 FIXED XWPFWordExtractorDecorator does not extract links in tables Major Resolved
   Bug TIKA-279 FIXED XWPFWordExtractorDecorator does not extract some headers/footers Major Resolved
   New Feature TIKA-302 FIXED patch: initial support for ePUB Major Resolved
   Improvement TIKA-296 FIXED Automatically set the supertype for "+xml" mimetypes Minor Resolved
   Bug TIKA-311 FIXED Broken handling of <a name="..."/> tags Minor Resolved
   Bug TIKA-263 FIXED Core parser classes duplicated in the tika-parser and tika-core jar files. Minor Resolved
   Improvement TIKA-276 FIXED Drop the StringUtils class Minor Resolved
   Improvement TIKA-280 FIXED Fix NOTICE files to match consensus from legal team Minor Resolved
   Bug TIKA-309 FIXED Mime type application/rdf+xml not correctly detected Minor Resolved
   Improvement TIKA-292 FIXED PDFBox is too verbose Minor Resolved
   Bug TIKA-297 FIXED The HtmlParser ignores <menu> tags, resulting in invalid XHTML Minor Resolved
   Improvement TIKA-299 FIXED Update Geronimo dependency in tika-parsers pom.xml to 1.0.1 Minor Resolved
   Improvement TIKA-158 FIXED Upgrade to Apache PDFBox Minor Resolved
   Bug TIKA-250 FIXED XLS parser does not extract empty sheet names Minor Resolved
   Bug TIKA-290 FIXED org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.txt.TXTParser@6caf16 Minor Resolved
   Improvement TIKA-313 FIXED patch: ODF improvements for svg:desc, presentation notes Minor Resolved
   Improvement TIKA-301 FIXED patch: embedded ODF and office:annotation Minor Resolved
   Bug TIKA-271 FIXED secure-processing not supported by some JAXP implementations Minor Resolved
   Improvement TIKA-264 FIXED Getting Started: change "source directory" to "base directory" or similar Trivial Resolved
   Test TIKA-306 FIXED patch: OOXMLParserTest uses OpenOfficeParser Trivial Resolved
   Improvement TIKA-300 FIXED rename openoffice.. parser classes to odf.. Trivial Resolved
Apache Tika 0.4
   Bug TIKA-260 FIXED Weird transitive dependencies from commons-logging Blocker Closed
   Bug TIKA-258 FIXED AutoDetectParser does not allow to use alternative mime detector Major Closed
   Improvement TIKA-253 FIXED Better mime type for ooxml files Major Closed
   Improvement TIKA-222 FIXED Drop commons-codec dependency from tika-core Major Closed
   Bug TIKA-255 FIXED Embedded Visio Content Crashes PPT Parser Major Closed
   Bug TIKA-121 FIXED MimeType.clean method no longer exists as a capability Major Closed
   Bug TIKA-244 FIXED Missing Header/Footer text for Word'97 documents Major Closed
   Improvement TIKA-248 FIXED No logging in tika-core Major Closed
   Bug TIKA-208 FIXED Special characters in HTML file are not parsed correctly Major Closed
   Improvement TIKA-219 FIXED Split Tika to separate modules Major Closed
   Bug TIKA-257 FIXED Uncorrect mime-type detection for ooxml Major Closed
   Improvement TIKA-215 FIXED Use a thread pool in ParsingReader Major Closed
   New Feature TIKA-216 FIXED Zip bomb prevention Major Closed
   Improvement TIKA-226 FIXED [PATCH] Generate javadocs and source indexes for every module Major Closed
   Improvement TIKA-230 FIXED [PATCH] Parent pom Major Closed
   Bug TIKA-225 FIXED [PATCH] Various bugfixes for MIME detection Major Closed
   Bug TIKA-210 FIXED html content directly under body node not parsed correctly Major Closed
   Bug TIKA-211 FIXED memory issue in ExcelExtractor Major Closed
   Improvement TIKA-247 FIXED parse language and category from MS Office properties Major Closed
   Improvement TIKA-254 FIXED parse ooxml templates and macro-enabled formats Major Closed
   New Feature TIKA-228 FIXED Add OSGi metadata to Tika Minor Closed
   New Feature TIKA-200 FIXED Allow URL drag and drop in the Tika GUI Minor Closed
   Improvement TIKA-198 FIXED Better distinction between IOException and TikaException Minor Closed
   Improvement TIKA-237 FIXED Better distinction between SAXException and TikaException Minor Closed
   Improvement TIKA-238 FIXED Better handling of delegating parser implementations Minor Closed
   Improvement TIKA-234 FIXED Drop SpellCheckedMetadata Minor Closed
   Improvement TIKA-221 FIXED Drop log4j dependency from tika-core Minor Closed
   Bug TIKA-240 FIXED Drop the BOM when extracting plain text Minor Closed
   Improvement TIKA-206 FIXED Improved pipe mode in Tika CLI Minor Closed
   Improvement TIKA-249 FIXED Inline key commons-io classes Minor Closed
   Improvement TIKA-233 FIXED Inline the ICU4J charset detection logic Minor Closed
   Bug TIKA-193 FIXED PDFParser adds mime-type twice Minor Closed
   Improvement TIKA-229 FIXED Per-component LICENSE and NOTICE files Minor Closed
   Improvement TIKA-220 FIXED Remove obsolete utility code Minor Closed
   Improvement TIKA-74 FIXED Test Resources should be loaded by the class loader (e.g. getResourceAsStream()). Minor Closed
   Bug TIKA-217 FIXED TikaConfig fails when a parser can't be loaded due to an Error Minor Closed
   Improvement TIKA-204 FIXED Use commons-compress for parsing packages Minor Closed
   New Feature TIKA-80 WON'T FIX Utility method in MimeUtils to perform full mime resolution using all available strategies Minor Closed
Apache Tika 0.3
   Bug TIKA-196 FIXED Configuration parser fails in Java 1.4 Critical Closed
   New Feature TIKA-201 FIXED Extract lyrics and other text from MIDI audio files Major Closed
   Bug TIKA-197 FIXED Microsoft Outlook (msg) files get parsed multiple times Major Closed
   New Feature TIKA-152 FIXED Support for Office XML files Major Closed
   Improvement TIKA-194 FIXED Support java regular expressions in glob pattern spec for mime repo Major Closed
   Bug TIKA-179 FIXED Tika stand alone CLI --text output mostly not working, other output formats are fine Major Closed
   Bug TIKA-180 FIXED XHTMLContentHandler unable to extract text from MSWord file Major Closed
   Bug TIKA-190 FIXED wrong handling of ignorableWhitespace/characters in SafeContentHandler and WriteoutContentHandler Major Closed
   Bug TIKA-79 WON'T FIX Mime type detection from file header appears to be failing. Major Closed
   Improvement TIKA-192 FIXED Add glob and magic patterns for image types Minor Closed
   Improvement TIKA-188 FIXED Automatic whitespace for block elements in XHTMLContentHandler Minor Closed
   Improvement TIKA-184 FIXED Avoid the <resource/> entry on ${basedir} Minor Closed
   Improvement TIKA-154 FIXED Better detection of plain text versus binary formats with a text header Minor Closed
   Improvement TIKA-203 FIXED Earlier metadata extraction in ParsingReader Minor Closed
   Improvement TIKA-183 FIXED Fix Maven plugin versions Minor Closed
   Bug TIKA-181 FIXED Retrotranslator plugin fails if using a 1.0-SNAPSHOT version Minor Closed
   Bug TIKA-189 FIXED Text extraction from Excel files juxtaposes cells Minor Closed
   Improvement TIKA-202 FIXED Warnings during Site generation Minor Closed
   Bug TIKA-185 FIXED XML files with (unsatisfied) SYSTEM entities can not be extracted Minor Closed
   Improvement TIKA-205 FIXED Factor out met keys in MimeTypesReader representing XML tag/attr names Trivial Closed
   Improvement TIKA-186 FIXED Refactor the MS Office property names to MSOffice.java Trivial Closed

Reports

Recently Created Issues Report
Created vs Resolved Issues Report
Resolution Time Report
Average Age Report
Pie Chart Report
Contribution Report
User Workload Report
Version Workload Report
Time Tracking Report
Single Level Group By Report

Preset Filters


Project Summary

Open Open 54
   16%
Resolved Resolved 67
   20%
Closed Closed 217
   64%

Open Issues

By Priority
Critical Critical 1
   2%
Major Major 19
   35%
Minor Minor 33
   61%
Trivial Trivial 1
   2%

By Assignee
Chris A. Mattmann 1
   2%
Dave Meikle 1
   2%
Jukka Zitting 6
   11%
Unassigned 46
   85%