All Projects : Tika (Key: TIKA)

Project Lead: Jukka Zitting
URL: http://lucene.apache.org/tika/
Description:
Apache Tika issues

Release Notes

 Select:   Open Issues   Road Map   Change Log   Popular Issues   Subversion Commits   Releases   Versions   Components   

Subversion Commits

All versions
Select version:
Repository Revision Date User Message
ASF #884889 Fri Nov 27 14:59:54 UTC 2009 jukka TIKA-321: Optimize type detection speed

Adapt test case to previous commit
Files Changed
MODIFY /lucene/tika/trunk/tika-parsers/src/test/java/org/apache/tika/mime/MimeTypesTest.java
Repository Revision Date User Message
ASF #884888 Fri Nov 27 14:59:11 UTC 2009 jukka TIKA-321: Optimize type detection speed

Reduce the memory overhead of MimeType by getting rid of two unused TreeSets per instance
Files Changed
MODIFY /lucene/tika/trunk/tika-core/src/main/java/org/apache/tika/mime/MimeType.java
Repository Revision Date User Message
ASF #884340 Wed Nov 25 23:41:10 UTC 2009 mattmann - fix for TIKA-336 More issues with RDF mime detection
Files Changed
MODIFY /lucene/tika/trunk/tika-core/src/test/resources/org/apache/tika/mime/test-difficult-rdf2.xml
MODIFY /lucene/tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
Repository Revision Date User Message
ASF #883308 Mon Nov 23 11:40:22 UTC 2009 jukka TIKA-321: Optimize type detection speed

Refactor to reduce the number of Clause objects that type detection needs to go through
Files Changed
DEL /lucene/tika/trunk/tika-core/src/main/java/org/apache/tika/mime/Operator.java
DEL /lucene/tika/trunk/tika-core/src/main/java/org/apache/tika/mime/MagicClause.java
ADD /lucene/tika/trunk/tika-core/src/main/java/org/apache/tika/mime/OrClause.java
MODIFY /lucene/tika/trunk/tika-core/src/main/java/org/apache/tika/mime/MimeTypesReader.java
MODIFY /lucene/tika/trunk/tika-core/src/main/java/org/apache/tika/mime/Clause.java
ADD /lucene/tika/trunk/tika-core/src/main/java/org/apache/tika/mime/AndClause.java
Repository Revision Date User Message
ASF #883306 Mon Nov 23 11:28:16 UTC 2009 jukka TIKA-330: Better HWP (Hangul Word Processor) detection pattern
Files Changed
MODIFY /lucene/tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
Repository Revision Date User Message
ASF #881744 Wed Nov 18 12:14:06 UTC 2009 jukka TIKA-321: Optimize type detection speed

Move the magic pattern parsing code to MimeTypesReader
Files Changed
MODIFY /lucene/tika/trunk/tika-core/src/main/java/org/apache/tika/mime/MimeTypesReader.java
MODIFY /lucene/tika/trunk/tika-core/src/main/java/org/apache/tika/mime/MagicMatch.java
Repository Revision Date User Message
ASF #881342 Tue Nov 17 15:46:41 UTC 2009 jukka TIKA-325: tika-parent/pom.xml missing <inceptionYear>2007</inceptionYear>

Fixed as suggested by Luke Nezda
Files Changed
MODIFY /lucene/tika/trunk/tika-parent/pom.xml
Repository Revision Date User Message
ASF #881320 Tue Nov 17 15:09:01 UTC 2009 jukka TIKA-324: Tika CLI mangles UTF-8 content in text (-t) mode (on Mac OS X)

Use UTF-8 as the default encoding on Mac OS X. The Java platform encoding is still set to MacRoman even though most parts of OS X already use UTF-8.
Files Changed
MODIFY /lucene/tika/trunk/tika-app/src/main/java/org/apache/tika/cli/TikaCLI.java
Repository Revision Date User Message
ASF #881285 Tue Nov 17 13:43:48 UTC 2009 jukka TIKA-321: Optimize type detection speed

Add a simple benchmark class for testing type detection speed.
Files Changed
ADD /lucene/tika/trunk/tika-core/src/test/java/org/apache/tika/TypeDetectionBenchmark.java
Repository Revision Date User Message
ASF #881276 Tue Nov 17 13:30:27 UTC 2009 jukka TIKA-326: Map javax.imageio.IIOException to TikaException
Files Changed
MODIFY /lucene/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/image/ImageParser.java
Repository Revision Date User Message
ASF #880852 Mon Nov 16 17:05:09 UTC 2009 jukka TIKA-321: Optimize type detection speed

Make the MagicDetector class thread-safe and reduce the amount of memory writes during matching.
Files Changed
MODIFY /lucene/tika/trunk/tika-core/src/main/java/org/apache/tika/detect/MagicDetector.java
Repository Revision Date User Message
ASF #880827 Mon Nov 16 16:18:46 UTC 2009 jukka TIKA-321: Optimize type detection speed

Use the new MagicDetector class in MagicMatch to avoid costly BigInteger calculations.
Files Changed
MODIFY /lucene/tika/trunk/tika-core/src/main/java/org/apache/tika/mime/MagicMatch.java
Repository Revision Date User Message
ASF #880815 Mon Nov 16 15:50:07 UTC 2009 jukka TIKA-321: Optimize type detection speed

Use the new XmlRootExtractor instead of the old regexp patterns for detecting different types of XML. This is notably faster than before as we need only a single pass over the initial bytes of the document.
Files Changed
MODIFY /lucene/tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
MODIFY /lucene/tika/trunk/tika-core/src/main/java/org/apache/tika/mime/MimeType.java
MODIFY /lucene/tika/trunk/tika-core/src/main/java/org/apache/tika/mime/MimeTypes.java
Repository Revision Date User Message
ASF #880784 Mon Nov 16 15:14:43 UTC 2009 jukka TIKA-309: Mime type application/rdf+xml not correctly detected

Move explanatory comments down in the test files to avoid interfering with the detection patterns.
Files Changed
MODIFY /lucene/tika/trunk/tika-core/src/test/resources/org/apache/tika/mime/test-difficult-rdf1.xml
MODIFY /lucene/tika/trunk/tika-core/src/test/resources/org/apache/tika/mime/test-difficult-rdf2.xml
Repository Revision Date User Message
ASF #880782 Mon Nov 16 15:07:22 UTC 2009 jukka TIKA-309: Mime type application/rdf+xml not correctly detected

Use local copies of the test documents to avoid test cases that depend on external network resources.
Files Changed
MODIFY /lucene/tika/trunk/tika-core/src/test/java/org/apache/tika/mime/MimeDetectionTest.java
ADD /lucene/tika/trunk/tika-core/src/test/resources/org/apache/tika/mime/test-difficult-rdf1.xml
ADD /lucene/tika/trunk/tika-core/src/test/resources/org/apache/tika/mime/test-difficult-rdf2.xml
Repository Revision Date User Message
ASF #836114 Sat Nov 14 03:47:55 UTC 2009 mattmann RE: TIKA-309, yes I can't count (4*1024 = 4096).
Files Changed
MODIFY /lucene/tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
Repository Revision Date User Message
ASF #836112 Sat Nov 14 03:43:09 UTC 2009 mattmann - increasing the offset to 4k bytes for an appearing <html tag seems to have fixed the unstable build issue introduced by TIKA-309
Files Changed
MODIFY /lucene/tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
Repository Revision Date User Message
ASF #836090 Sat Nov 14 02:09:28 UTC 2009 mattmann - remove duplicate glob: TIKA-309
Files Changed
MODIFY /lucene/tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
Repository Revision Date User Message
ASF #836057 Fri Nov 13 23:20:32 UTC 2009 jukka TIKA-275: Parse context

Replace the context Map with an explicit ParseContext object.
Files Changed
MODIFY /lucene/tika/trunk/tika-core/src/main/java/org/apache/tika/parser/Parser.java
ADD /lucene/tika/trunk/tika-core/src/main/java/org/apache/tika/parser/ParseContext.java
MODIFY /lucene/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/epub/EpubContentParser.java
MODIFY /lucene/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/pkg/Bzip2Parser.java
MODIFY /lucene/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/mbox/MboxParser.java
MODIFY /lucene/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/jpeg/JpegParser.java
MODIFY /lucene/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/epub/EpubParser.java
MODIFY /lucene/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/odf/OpenDocumentParser.java
MODIFY /lucene/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/image/ImageParser.java
MODIFY /lucene/tika/trunk/tika-core/src/main/java/org/apache/tika/parser/ParsingReader.java
MODIFY /lucene/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/audio/MidiParser.java
MODIFY /lucene/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/xml/XMLParser.java
MODIFY /lucene/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/pkg/CpioParser.java
MODIFY /lucene/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ooxml/OOXMLParser.java
MODIFY /lucene/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/audio/AudioParser.java
MODIFY /lucene/tika/trunk/tika-core/src/main/java/org/apache/tika/parser/ErrorParser.java
MODIFY /lucene/tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/mbox/MboxParserTest.java
MODIFY /lucene/tika/trunk/tika-core/src/main/java/org/apache/tika/parser/DelegatingParser.java
MODIFY /lucene/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDFParser.java
MODIFY /lucene/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/mp3/Mp3Parser.java
MODIFY /lucene/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/txt/TXTParser.java
MODIFY /lucene/tika/trunk/tika-core/src/main/java/org/apache/tika/parser/ParserDecorator.java
MODIFY /lucene/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/html/HtmlParser.java
MODIFY /lucene/tika/trunk/tika-core/src/main/java/org/apache/tika/Tika.java
MODIFY /lucene/tika/trunk/tika-app/src/main/java/org/apache/tika/cli/TikaCLI.java
MODIFY /lucene/tika/trunk/tika-core/src/main/java/org/apache/tika/parser/ParserPostProcessor.java
MODIFY /lucene/tika/trunk/tika-core/src/main/java/org/apache/tika/parser/AutoDetectParser.java
MODIFY /lucene/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/OfficeParser.java
MODIFY /lucene/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/pkg/GzipParser.java
MODIFY /lucene/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/asm/ClassParser.java
MODIFY /lucene/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/pkg/PackageParser.java
MODIFY /lucene/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/pkg/TarParser.java
MODIFY /lucene/tika/trunk/tika-app/src/main/java/org/apache/tika/gui/TikaGUI.java
MODIFY /lucene/tika/trunk/tika-core/src/main/java/org/apache/tika/parser/ExternalParser.java
MODIFY /lucene/tika/trunk/tika-core/src/main/java/org/apache/tika/parser/EmptyParser.java
MODIFY /lucene/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/pkg/ZipParser.java
MODIFY /lucene/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/pkg/ArParser.java
MODIFY /lucene/tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/html/HtmlParserTest.java
MODIFY /lucene/tika/trunk/tika-core/src/main/java/org/apache/tika/parser/CompositeParser.java
MODIFY /lucene/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/rtf/RTFParser.java
MODIFY /lucene/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/odf/OpenDocumentContentParser.java
Repository Revision Date User Message
ASF #836035 Fri Nov 13 22:35:27 UTC 2009 mattmann - fix for TIKA-309: Mime type application/rdf+xml not correctly detected
Files Changed
MODIFY /lucene/tika/trunk/tika-core/src/test/java/org/apache/tika/mime/MimeDetectionTest.java
MODIFY /lucene/tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
MODIFY /lucene/tika/trunk/CHANGES.txt
MODIFY /lucene/tika/trunk/tika-core/src/main/java/org/apache/tika/mime/MimeType.java

Reports

Recently Created Issues Report
Created vs Resolved Issues Report
Resolution Time Report
Average Age Report
Pie Chart Report
Contribution Report
User Workload Report
Version Workload Report
Time Tracking Report
Single Level Group By Report

Preset Filters


Project Summary

Open Open 56
   17%
Resolved Resolved 65
   19%
Closed Closed 217
   64%

Open Issues

By Priority
Critical Critical 1
   2%
Major Major 21
   38%
Minor Minor 33
   59%
Trivial Trivial 1
   2%

By Assignee
Chris A. Mattmann 1
   2%
Dave Meikle 1
   2%
Jukka Zitting 6
   11%
Unassigned 48
   86%