Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-1710

Replace usages of classes in org.apache.tika.io with current alternatives

    Details

    • Flags:
      Patch

      Description

      Many of the classes in org.apache.tika.io were inlined from commons-io in TIKA-249, but these days most components use commons-io anyway, so in order to clean the dependencies on org.apache.tika.io in preparation of adding commons-io to tika-core, the following can be done:

      • Replace usages of classes in org.apache.tika.io within non-core components with the corresponding classes in commons-io
      • Replace usages of org.apache.tika.io.IOUtils.UTF_8 with java.nio.charset.StandardCharsets.UTF_8 (in all components, including tika-core)
      • Replace other uses of String encoding names of standard charsets with their corresponding Charsets instances from StandardCharsets (this is logically related to IOUtils as these constants should have been there as UTF_8 was before Java 7)
      1. TIKA-1710.patch
        265 kB
        Yaniv Kunda

        Issue Links

          Activity

          Hide
          kunda Yaniv Kunda added a comment -

          A patch for the described changes

          Show
          kunda Yaniv Kunda added a comment - A patch for the described changes
          Hide
          gagravarr Nick Burch added a comment -

          At first glance, looks good on the code side, but epic! Only one slight nit-pic - we generally try to avoid wildcard imports. If it's quick, any chance you could re-do the patch with the standard charsets static imported without the wildcard? (I think it's normally UTF8, just a few places with others)

          Show
          gagravarr Nick Burch added a comment - At first glance, looks good on the code side, but epic! Only one slight nit-pic - we generally try to avoid wildcard imports. If it's quick, any chance you could re-do the patch with the standard charsets static imported without the wildcard? (I think it's normally UTF8, just a few places with others)
          Hide
          kunda Yaniv Kunda added a comment -

          Revised patch without StandardCharsets wildcard static imports

          Show
          kunda Yaniv Kunda added a comment - Revised patch without StandardCharsets wildcard static imports
          Hide
          gagravarr Nick Burch added a comment -

          Thanks for this, applied in smaller chunks in r1696741 through 1696751.

          Two questions:

          • Your patch removed guava, but I couldn't see an explanation of why? I didn't commit that part, could you explain why you think it can / should be removed?
          • Our TaggedInputStream get syntax looks cleaner than what you've had to do instead. Do you think it's worth adding a helper for that / asking Commons IO to implement that pattern for us?
          Show
          gagravarr Nick Burch added a comment - Thanks for this, applied in smaller chunks in r1696741 through 1696751. Two questions: Your patch removed guava, but I couldn't see an explanation of why? I didn't commit that part, could you explain why you think it can / should be removed? Our TaggedInputStream get syntax looks cleaner than what you've had to do instead. Do you think it's worth adding a helper for that / asking Commons IO to implement that pattern for us?
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in tika-trunk-jdk1.7 #838 (See https://builds.apache.org/job/tika-trunk-jdk1.7/838/)
          TIKA-1710 patch from Yaniv Kunda - Use Commons IO instead of the Tika Core IO copies, and java.nio.charset.StandardCharsets (nick: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1696751)

          • /tika/trunk/tika-bundle/src/test/java/org/apache/tika/bundle/BundleIT.java
          • /tika/trunk/tika-example/src/main/java/org/apache/tika/example/DirListParser.java
          • /tika/trunk/tika-example/src/main/java/org/apache/tika/example/DumpTikaConfigExample.java
          • /tika/trunk/tika-example/src/main/java/org/apache/tika/example/ExtractEmbeddedFiles.java
          • /tika/trunk/tika-example/src/main/java/org/apache/tika/example/MyFirstTika.java
          • /tika/trunk/tika-example/src/main/java/org/apache/tika/example/RollbackSoftware.java
          • /tika/trunk/tika-example/src/main/java/org/apache/tika/example/SpringExample.java
          • /tika/trunk/tika-example/src/test/java/org/apache/tika/example/DumpTikaConfigExampleTest.java
          • /tika/trunk/tika-example/src/test/java/org/apache/tika/example/SimpleTextExtractorTest.java
          • /tika/trunk/tika-example/src/test/java/org/apache/tika/example/SimpleTypeDetectorTest.java
            TIKA-1710 patch from Yaniv Kunda - Use Commons IO instead of the Tika Core IO copies, and java.nio.charset.StandardCharsets (nick: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1696750)
          • /tika/trunk/tika-server/src/main/java/org/apache/tika/server/HTMLHelper.java
          • /tika/trunk/tika-server/src/main/java/org/apache/tika/server/resource/LanguageResource.java
          • /tika/trunk/tika-server/src/main/java/org/apache/tika/server/resource/TikaResource.java
          • /tika/trunk/tika-server/src/main/java/org/apache/tika/server/resource/TranslateResource.java
          • /tika/trunk/tika-server/src/main/java/org/apache/tika/server/resource/UnpackerResource.java
          • /tika/trunk/tika-server/src/main/java/org/apache/tika/server/writer/CSVMessageBodyWriter.java
          • /tika/trunk/tika-server/src/main/java/org/apache/tika/server/writer/JSONMessageBodyWriter.java
          • /tika/trunk/tika-server/src/main/java/org/apache/tika/server/writer/MetadataListMessageBodyWriter.java
          • /tika/trunk/tika-server/src/main/java/org/apache/tika/server/writer/TextMessageBodyWriter.java
          • /tika/trunk/tika-server/src/main/java/org/apache/tika/server/writer/XMPMessageBodyWriter.java
          • /tika/trunk/tika-server/src/test/java/org/apache/tika/server/CXFTestBase.java
          • /tika/trunk/tika-server/src/test/java/org/apache/tika/server/MetadataResourceTest.java
          • /tika/trunk/tika-server/src/test/java/org/apache/tika/server/RecursiveMetadataResourceTest.java
            TIKA-1710 patch from Yaniv Kunda - Use Commons IO instead of the Tika Core IO copies, and java.nio.charset.StandardCharsets (nick: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1696749)
          • /tika/trunk/tika-batch/pom.xml
          • /tika/trunk/tika-batch/src/main/java/org/apache/tika/batch/BatchProcess.java
          • /tika/trunk/tika-batch/src/main/java/org/apache/tika/batch/BatchProcessDriverCLI.java
          • /tika/trunk/tika-batch/src/main/java/org/apache/tika/batch/Interrupter.java
          • /tika/trunk/tika-batch/src/main/java/org/apache/tika/batch/fs/BasicTikaFSConsumer.java
          • /tika/trunk/tika-batch/src/main/java/org/apache/tika/batch/fs/FSBatchProcessCLI.java
          • /tika/trunk/tika-batch/src/main/java/org/apache/tika/batch/fs/RecursiveParserWrapperFSConsumer.java
          • /tika/trunk/tika-batch/src/main/java/org/apache/tika/batch/fs/strawman/StrawManTikaAppDriver.java
          • /tika/trunk/tika-batch/src/test/java/org/apache/tika/batch/CommandLineParserBuilderTest.java
          • /tika/trunk/tika-batch/src/test/java/org/apache/tika/batch/RecursiveParserWrapperFSConsumerTest.java
          • /tika/trunk/tika-batch/src/test/java/org/apache/tika/batch/fs/BatchDriverTest.java
          • /tika/trunk/tika-batch/src/test/java/org/apache/tika/batch/fs/BatchProcessTest.java
          • /tika/trunk/tika-batch/src/test/java/org/apache/tika/batch/fs/FSBatchTestBase.java
          • /tika/trunk/tika-batch/src/test/java/org/apache/tika/batch/fs/HandlerBuilderTest.java
          • /tika/trunk/tika-batch/src/test/java/org/apache/tika/batch/fs/StringStreamGobbler.java
            TIKA-1710 patch from Yaniv Kunda - Use Commons IO instead of the Tika Core IO copies, and java.nio.charset.StandardCharsets (nick: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1696748)
          • /tika/trunk/tika-app/src/main/java/org/apache/tika/cli/TikaCLI.java
          • /tika/trunk/tika-app/src/main/java/org/apache/tika/gui/TikaGUI.java
          • /tika/trunk/tika-app/src/test/java/org/apache/tika/cli/TikaCLIBatchCommandLineTest.java
          • /tika/trunk/tika-app/src/test/java/org/apache/tika/cli/TikaCLIBatchIntegrationTest.java
          • /tika/trunk/tika-app/src/test/java/org/apache/tika/cli/TikaCLITest.java
            TIKA-1710 patch from Yaniv Kunda - Use java.nio.charset.StandardCharsets (nick: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1696747)
          • /tika/trunk/tika-translate/src/main/java/org/apache/tika/language/translate/GoogleTranslator.java
          • /tika/trunk/tika-translate/src/main/java/org/apache/tika/language/translate/Lingo24Translator.java
            TIKA-1710 patch from Yaniv Kunda - Use Commons IO instead of the Tika Core IO copies, and java.nio.charset.StandardCharsets (nick: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1696746)
          • /tika/trunk/tika-parsers/src/test/java/org/apache/tika/embedder/ExternalEmbedderTest.java
          • /tika/trunk/tika-parsers/src/test/java/org/apache/tika/mime/TestMimeTypes.java
          • /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/AutoDetectParserTest.java
          • /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/ParsingReaderTest.java
          • /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/RecursiveParserWrapperTest.java
          • /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/chm/TestChmBlockInfo.java
          • /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/chm/TestChmExtraction.java
          • /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/chm/TestChmItspHeader.java
          • /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/chm/TestChmLzxState.java
          • /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/chm/TestChmLzxcControlData.java
          • /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/chm/TestChmLzxcResetTable.java
          • /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/chm/TestParameters.java
          • /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/chm/TestPmglHeader.java
          • /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/code/SourceCodeParserTest.java
          • /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/geo/topic/GeoParserTest.java
          • /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/html/HtmlParserTest.java
          • /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/image/WebPParserTest.java
          • /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/jdbc/SQLite3ParserTest.java
          • /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/mail/RFC822ParserTest.java
          • /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/microsoft/JackcessParserTest.java
          • /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/microsoft/ooxml/OOXMLParserTest.java
          • /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/mock/MockParserTest.java
          • /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/mp3/MpegStreamTest.java
          • /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/pdf/PDFParserTest.java
          • /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/pkg/Bzip2ParserTest.java
          • /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/pkg/GzipParserTest.java
          • /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/rtf/RTFParserTest.java
          • /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/strings/Latin1StringsParserTest.java
          • /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/txt/TXTParserTest.java
            TIKA-1710 patch from Yaniv Kunda - Use Commons IO instead of the Tika Core IO copies, and java.nio.charset.StandardCharsets (nick: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1696745)
          • /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/audio/MidiParser.java
          • /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/chm/accessor/ChmDirectoryListingSet.java
          • /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/chm/accessor/ChmItsfHeader.java
          • /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/chm/accessor/ChmItspHeader.java
          • /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/chm/accessor/ChmLzxcControlData.java
          • /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/chm/accessor/ChmPmgiHeader.java
          • /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/chm/accessor/ChmPmglHeader.java
          • /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/chm/core/ChmConstants.java
          • /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/chm/core/ChmExtractor.java
          • /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/code/SourceCodeParser.java
          • /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/crypto/Pkcs7Parser.java
          • /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/ctakes/CTAKESConfig.java
          • /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/dif/DIFParser.java
          • /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/envi/EnviHeaderParser.java
          • /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/epub/EpubContentParser.java
          • /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/epub/EpubParser.java
          • /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/feed/FeedParser.java
          • /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/gdal/GDALParser.java
          • /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/geo/topic/NameEntityExtractor.java
          • /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/hdf/HDFParser.java
          • /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/html/HtmlParser.java
          • /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/image/ImageParser.java
          • /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/image/PSDParser.java
          • /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/image/xmp/JempboxExtractor.java
          • /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/image/xmp/XMPPacketScanner.java
          • /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/iptc/IptcAnpaParser.java
          • /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/isatab/ISATabUtils.java
          • /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/iwork/IWorkPackageParser.java
          • /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/jdbc/AbstractDBParser.java
          • /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/jdbc/JDBCTableReader.java
          • /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/jdbc/SQLite3DBParser.java
          • /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/mail/RFC822Parser.java
          • /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/mat/MatParser.java
          • /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/mbox/OutlookPSTParser.java
          • /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/JackcessExtractor.java
          • /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/OfficeParser.java
          • /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/OutlookExtractor.java
          • /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/POIFSContainerDetector.java
          • /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/WordExtractor.java
          • /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ooxml/OOXMLExtractorFactory.java
          • /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/mp3/ID3v1Handler.java
          • /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/mp3/ID3v2Frame.java
          • /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/mp3/LyricsHandler.java
          • /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/ocr/TesseractOCRParser.java
          • /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/odf/OpenDocumentContentParser.java
          • /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/odf/OpenDocumentParser.java
          • /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDF2XHTML.java
          • /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDFEncodedStringDecoder.java
          • /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDFParser.java
          • /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/pkg/CompressorParser.java
          • /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/pkg/PackageParser.java
          • /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/pkg/ZipContainerDetector.java
          • /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/prt/PRTParser.java
          • /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/rtf/RTFEmbObjHandler.java
          • /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/rtf/RTFObjDataParser.java
          • /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/rtf/RTFParser.java
          • /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/strings/StringsParser.java
          • /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/txt/TXTParser.java
          • /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/utils/CommonsDigester.java
          • /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/video/FLVParser.java
          • /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/xml/XMLParser.java
            TIKA-1710 patch from Yaniv Kunda - Use java.nio.charset.StandardCharsets (nick: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1696743)
          • /tika/trunk/tika-core/src/test/java/org/apache/tika/detect/MagicDetectorTest.java
          • /tika/trunk/tika-core/src/test/java/org/apache/tika/detect/TextDetectorTest.java
          • /tika/trunk/tika-core/src/test/java/org/apache/tika/io/TailStreamTest.java
          • /tika/trunk/tika-core/src/test/java/org/apache/tika/io/TikaInputStreamTest.java
          • /tika/trunk/tika-core/src/test/java/org/apache/tika/language/LanguageIdentifierTest.java
          • /tika/trunk/tika-core/src/test/java/org/apache/tika/language/LanguageProfilerBuilderTest.java
          • /tika/trunk/tika-core/src/test/java/org/apache/tika/mime/MimeDetectionTest.java
          • /tika/trunk/tika-core/src/test/java/org/apache/tika/mime/MimeTypesReaderTest.java
          • /tika/trunk/tika-core/src/test/java/org/apache/tika/mime/ProbabilisticMimeDetectionTest.java
          • /tika/trunk/tika-core/src/test/java/org/apache/tika/mime/ProbabilisticMimeDetectionTestWithTika.java
          • /tika/trunk/tika-core/src/test/java/org/apache/tika/parser/mock/MockParser.java
          • /tika/trunk/tika-core/src/test/java/org/apache/tika/sax/BasicContentHandlerFactoryTest.java
          • /tika/trunk/tika-core/src/test/java/org/apache/tika/sax/BodyContentHandlerTest.java
            TIKA-1710 patch from Yaniv Kunda - Use java.nio.charset.StandardCharsets (nick: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1696741)
          • /tika/trunk/tika-core/src/main/java/org/apache/tika/config/ServiceLoader.java
          • /tika/trunk/tika-core/src/main/java/org/apache/tika/detect/MagicDetector.java
          • /tika/trunk/tika-core/src/main/java/org/apache/tika/detect/NNExampleModelDetector.java
          • /tika/trunk/tika-core/src/main/java/org/apache/tika/detect/NameDetector.java
          • /tika/trunk/tika-core/src/main/java/org/apache/tika/detect/TrainedModelDetector.java
          • /tika/trunk/tika-core/src/main/java/org/apache/tika/embedder/ExternalEmbedder.java
          • /tika/trunk/tika-core/src/main/java/org/apache/tika/fork/ForkClient.java
          • /tika/trunk/tika-core/src/main/java/org/apache/tika/io/IOUtils.java
          • /tika/trunk/tika-core/src/main/java/org/apache/tika/language/LanguageIdentifier.java
          • /tika/trunk/tika-core/src/main/java/org/apache/tika/language/LanguageProfilerBuilder.java
          • /tika/trunk/tika-core/src/main/java/org/apache/tika/parser/external/ExternalParser.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in tika-trunk-jdk1.7 #838 (See https://builds.apache.org/job/tika-trunk-jdk1.7/838/ ) TIKA-1710 patch from Yaniv Kunda - Use Commons IO instead of the Tika Core IO copies, and java.nio.charset.StandardCharsets (nick: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1696751 ) /tika/trunk/tika-bundle/src/test/java/org/apache/tika/bundle/BundleIT.java /tika/trunk/tika-example/src/main/java/org/apache/tika/example/DirListParser.java /tika/trunk/tika-example/src/main/java/org/apache/tika/example/DumpTikaConfigExample.java /tika/trunk/tika-example/src/main/java/org/apache/tika/example/ExtractEmbeddedFiles.java /tika/trunk/tika-example/src/main/java/org/apache/tika/example/MyFirstTika.java /tika/trunk/tika-example/src/main/java/org/apache/tika/example/RollbackSoftware.java /tika/trunk/tika-example/src/main/java/org/apache/tika/example/SpringExample.java /tika/trunk/tika-example/src/test/java/org/apache/tika/example/DumpTikaConfigExampleTest.java /tika/trunk/tika-example/src/test/java/org/apache/tika/example/SimpleTextExtractorTest.java /tika/trunk/tika-example/src/test/java/org/apache/tika/example/SimpleTypeDetectorTest.java TIKA-1710 patch from Yaniv Kunda - Use Commons IO instead of the Tika Core IO copies, and java.nio.charset.StandardCharsets (nick: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1696750 ) /tika/trunk/tika-server/src/main/java/org/apache/tika/server/HTMLHelper.java /tika/trunk/tika-server/src/main/java/org/apache/tika/server/resource/LanguageResource.java /tika/trunk/tika-server/src/main/java/org/apache/tika/server/resource/TikaResource.java /tika/trunk/tika-server/src/main/java/org/apache/tika/server/resource/TranslateResource.java /tika/trunk/tika-server/src/main/java/org/apache/tika/server/resource/UnpackerResource.java /tika/trunk/tika-server/src/main/java/org/apache/tika/server/writer/CSVMessageBodyWriter.java /tika/trunk/tika-server/src/main/java/org/apache/tika/server/writer/JSONMessageBodyWriter.java /tika/trunk/tika-server/src/main/java/org/apache/tika/server/writer/MetadataListMessageBodyWriter.java /tika/trunk/tika-server/src/main/java/org/apache/tika/server/writer/TextMessageBodyWriter.java /tika/trunk/tika-server/src/main/java/org/apache/tika/server/writer/XMPMessageBodyWriter.java /tika/trunk/tika-server/src/test/java/org/apache/tika/server/CXFTestBase.java /tika/trunk/tika-server/src/test/java/org/apache/tika/server/MetadataResourceTest.java /tika/trunk/tika-server/src/test/java/org/apache/tika/server/RecursiveMetadataResourceTest.java TIKA-1710 patch from Yaniv Kunda - Use Commons IO instead of the Tika Core IO copies, and java.nio.charset.StandardCharsets (nick: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1696749 ) /tika/trunk/tika-batch/pom.xml /tika/trunk/tika-batch/src/main/java/org/apache/tika/batch/BatchProcess.java /tika/trunk/tika-batch/src/main/java/org/apache/tika/batch/BatchProcessDriverCLI.java /tika/trunk/tika-batch/src/main/java/org/apache/tika/batch/Interrupter.java /tika/trunk/tika-batch/src/main/java/org/apache/tika/batch/fs/BasicTikaFSConsumer.java /tika/trunk/tika-batch/src/main/java/org/apache/tika/batch/fs/FSBatchProcessCLI.java /tika/trunk/tika-batch/src/main/java/org/apache/tika/batch/fs/RecursiveParserWrapperFSConsumer.java /tika/trunk/tika-batch/src/main/java/org/apache/tika/batch/fs/strawman/StrawManTikaAppDriver.java /tika/trunk/tika-batch/src/test/java/org/apache/tika/batch/CommandLineParserBuilderTest.java /tika/trunk/tika-batch/src/test/java/org/apache/tika/batch/RecursiveParserWrapperFSConsumerTest.java /tika/trunk/tika-batch/src/test/java/org/apache/tika/batch/fs/BatchDriverTest.java /tika/trunk/tika-batch/src/test/java/org/apache/tika/batch/fs/BatchProcessTest.java /tika/trunk/tika-batch/src/test/java/org/apache/tika/batch/fs/FSBatchTestBase.java /tika/trunk/tika-batch/src/test/java/org/apache/tika/batch/fs/HandlerBuilderTest.java /tika/trunk/tika-batch/src/test/java/org/apache/tika/batch/fs/StringStreamGobbler.java TIKA-1710 patch from Yaniv Kunda - Use Commons IO instead of the Tika Core IO copies, and java.nio.charset.StandardCharsets (nick: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1696748 ) /tika/trunk/tika-app/src/main/java/org/apache/tika/cli/TikaCLI.java /tika/trunk/tika-app/src/main/java/org/apache/tika/gui/TikaGUI.java /tika/trunk/tika-app/src/test/java/org/apache/tika/cli/TikaCLIBatchCommandLineTest.java /tika/trunk/tika-app/src/test/java/org/apache/tika/cli/TikaCLIBatchIntegrationTest.java /tika/trunk/tika-app/src/test/java/org/apache/tika/cli/TikaCLITest.java TIKA-1710 patch from Yaniv Kunda - Use java.nio.charset.StandardCharsets (nick: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1696747 ) /tika/trunk/tika-translate/src/main/java/org/apache/tika/language/translate/GoogleTranslator.java /tika/trunk/tika-translate/src/main/java/org/apache/tika/language/translate/Lingo24Translator.java TIKA-1710 patch from Yaniv Kunda - Use Commons IO instead of the Tika Core IO copies, and java.nio.charset.StandardCharsets (nick: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1696746 ) /tika/trunk/tika-parsers/src/test/java/org/apache/tika/embedder/ExternalEmbedderTest.java /tika/trunk/tika-parsers/src/test/java/org/apache/tika/mime/TestMimeTypes.java /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/AutoDetectParserTest.java /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/ParsingReaderTest.java /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/RecursiveParserWrapperTest.java /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/chm/TestChmBlockInfo.java /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/chm/TestChmExtraction.java /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/chm/TestChmItspHeader.java /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/chm/TestChmLzxState.java /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/chm/TestChmLzxcControlData.java /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/chm/TestChmLzxcResetTable.java /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/chm/TestParameters.java /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/chm/TestPmglHeader.java /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/code/SourceCodeParserTest.java /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/geo/topic/GeoParserTest.java /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/html/HtmlParserTest.java /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/image/WebPParserTest.java /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/jdbc/SQLite3ParserTest.java /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/mail/RFC822ParserTest.java /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/microsoft/JackcessParserTest.java /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/microsoft/ooxml/OOXMLParserTest.java /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/mock/MockParserTest.java /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/mp3/MpegStreamTest.java /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/pdf/PDFParserTest.java /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/pkg/Bzip2ParserTest.java /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/pkg/GzipParserTest.java /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/rtf/RTFParserTest.java /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/strings/Latin1StringsParserTest.java /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/txt/TXTParserTest.java TIKA-1710 patch from Yaniv Kunda - Use Commons IO instead of the Tika Core IO copies, and java.nio.charset.StandardCharsets (nick: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1696745 ) /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/audio/MidiParser.java /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/chm/accessor/ChmDirectoryListingSet.java /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/chm/accessor/ChmItsfHeader.java /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/chm/accessor/ChmItspHeader.java /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/chm/accessor/ChmLzxcControlData.java /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/chm/accessor/ChmPmgiHeader.java /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/chm/accessor/ChmPmglHeader.java /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/chm/core/ChmConstants.java /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/chm/core/ChmExtractor.java /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/code/SourceCodeParser.java /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/crypto/Pkcs7Parser.java /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/ctakes/CTAKESConfig.java /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/dif/DIFParser.java /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/envi/EnviHeaderParser.java /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/epub/EpubContentParser.java /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/epub/EpubParser.java /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/feed/FeedParser.java /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/gdal/GDALParser.java /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/geo/topic/NameEntityExtractor.java /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/hdf/HDFParser.java /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/html/HtmlParser.java /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/image/ImageParser.java /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/image/PSDParser.java /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/image/xmp/JempboxExtractor.java /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/image/xmp/XMPPacketScanner.java /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/iptc/IptcAnpaParser.java /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/isatab/ISATabUtils.java /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/iwork/IWorkPackageParser.java /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/jdbc/AbstractDBParser.java /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/jdbc/JDBCTableReader.java /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/jdbc/SQLite3DBParser.java /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/mail/RFC822Parser.java /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/mat/MatParser.java /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/mbox/OutlookPSTParser.java /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/JackcessExtractor.java /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/OfficeParser.java /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/OutlookExtractor.java /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/POIFSContainerDetector.java /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/WordExtractor.java /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ooxml/OOXMLExtractorFactory.java /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/mp3/ID3v1Handler.java /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/mp3/ID3v2Frame.java /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/mp3/LyricsHandler.java /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/ocr/TesseractOCRParser.java /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/odf/OpenDocumentContentParser.java /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/odf/OpenDocumentParser.java /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDF2XHTML.java /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDFEncodedStringDecoder.java /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDFParser.java /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/pkg/CompressorParser.java /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/pkg/PackageParser.java /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/pkg/ZipContainerDetector.java /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/prt/PRTParser.java /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/rtf/RTFEmbObjHandler.java /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/rtf/RTFObjDataParser.java /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/rtf/RTFParser.java /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/strings/StringsParser.java /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/txt/TXTParser.java /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/utils/CommonsDigester.java /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/video/FLVParser.java /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/xml/XMLParser.java TIKA-1710 patch from Yaniv Kunda - Use java.nio.charset.StandardCharsets (nick: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1696743 ) /tika/trunk/tika-core/src/test/java/org/apache/tika/detect/MagicDetectorTest.java /tika/trunk/tika-core/src/test/java/org/apache/tika/detect/TextDetectorTest.java /tika/trunk/tika-core/src/test/java/org/apache/tika/io/TailStreamTest.java /tika/trunk/tika-core/src/test/java/org/apache/tika/io/TikaInputStreamTest.java /tika/trunk/tika-core/src/test/java/org/apache/tika/language/LanguageIdentifierTest.java /tika/trunk/tika-core/src/test/java/org/apache/tika/language/LanguageProfilerBuilderTest.java /tika/trunk/tika-core/src/test/java/org/apache/tika/mime/MimeDetectionTest.java /tika/trunk/tika-core/src/test/java/org/apache/tika/mime/MimeTypesReaderTest.java /tika/trunk/tika-core/src/test/java/org/apache/tika/mime/ProbabilisticMimeDetectionTest.java /tika/trunk/tika-core/src/test/java/org/apache/tika/mime/ProbabilisticMimeDetectionTestWithTika.java /tika/trunk/tika-core/src/test/java/org/apache/tika/parser/mock/MockParser.java /tika/trunk/tika-core/src/test/java/org/apache/tika/sax/BasicContentHandlerFactoryTest.java /tika/trunk/tika-core/src/test/java/org/apache/tika/sax/BodyContentHandlerTest.java TIKA-1710 patch from Yaniv Kunda - Use java.nio.charset.StandardCharsets (nick: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1696741 ) /tika/trunk/tika-core/src/main/java/org/apache/tika/config/ServiceLoader.java /tika/trunk/tika-core/src/main/java/org/apache/tika/detect/MagicDetector.java /tika/trunk/tika-core/src/main/java/org/apache/tika/detect/NNExampleModelDetector.java /tika/trunk/tika-core/src/main/java/org/apache/tika/detect/NameDetector.java /tika/trunk/tika-core/src/main/java/org/apache/tika/detect/TrainedModelDetector.java /tika/trunk/tika-core/src/main/java/org/apache/tika/embedder/ExternalEmbedder.java /tika/trunk/tika-core/src/main/java/org/apache/tika/fork/ForkClient.java /tika/trunk/tika-core/src/main/java/org/apache/tika/io/IOUtils.java /tika/trunk/tika-core/src/main/java/org/apache/tika/language/LanguageIdentifier.java /tika/trunk/tika-core/src/main/java/org/apache/tika/language/LanguageProfilerBuilder.java /tika/trunk/tika-core/src/main/java/org/apache/tika/parser/external/ExternalParser.java
          Hide
          kunda Yaniv Kunda added a comment -

          As much as I like Guava (the library, not the fruit) its only use was its com.google.common.baseCharsets class, containing constants for the Charset instances of the standard charsets - same as in Java's StandardCharsets.
          When I replaced this with the static imports of StandardCharsets, there was no use left.

          Regarding TaggedInputStream, I wasn't sure what to do - this wrap/cast method was a modification of the original commons-io code, and it was used only once - in RFC822Parser.
          I think it's a nice-to-have optimization helper method but nothing more - as it only saves the cost of a new TaggedInputStream when the source InputStream is already a TaggedInputStream: the checked tag will behave the same way in the same wrap-try-catch flow.
          The only other usage of TaggedInputStream in tika (besides by TikaInputStream) is in RTFParser, by using the constructor directly, is actually an empty usage - the TaggedInputStream is constructed and checked in the catch clause, but it is not used in the try block at all: the underlying stream does!

          Since almost all of tika uses TikaInputStream (which has an advanced version of this helper, ensuring bufferism), my opinion is to refrain from adding a helper method and simply use the constructor directly, for simplicity.

          Show
          kunda Yaniv Kunda added a comment - As much as I like Guava (the library, not the fruit) its only use was its com.google.common.baseCharsets class, containing constants for the Charset instances of the standard charsets - same as in Java's StandardCharsets. When I replaced this with the static imports of StandardCharsets, there was no use left. Regarding TaggedInputStream, I wasn't sure what to do - this wrap/cast method was a modification of the original commons-io code, and it was used only once - in RFC822Parser. I think it's a nice-to-have optimization helper method but nothing more - as it only saves the cost of a new TaggedInputStream when the source InputStream is already a TaggedInputStream: the checked tag will behave the same way in the same wrap-try-catch flow. The only other usage of TaggedInputStream in tika (besides by TikaInputStream) is in RTFParser, by using the constructor directly, is actually an empty usage - the TaggedInputStream is constructed and checked in the catch clause, but it is not used in the try block at all: the underlying stream does! Since almost all of tika uses TikaInputStream (which has an advanced version of this helper, ensuring bufferism), my opinion is to refrain from adding a helper method and simply use the constructor directly, for simplicity.
          Hide
          gagravarr Nick Burch added a comment -

          Thanks for the explanation, Guava dependency removed in r1696860, and TaggedInputStream swapped for TikaInputStream in r1696862.

          Hopefully that's us done for this, now we just need to wait on a consensus on the Tika Core changes!

          Show
          gagravarr Nick Burch added a comment - Thanks for the explanation, Guava dependency removed in r1696860, and TaggedInputStream swapped for TikaInputStream in r1696862. Hopefully that's us done for this, now we just need to wait on a consensus on the Tika Core changes!
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in tika-trunk-jdk1.7 #843 (See https://builds.apache.org/job/tika-trunk-jdk1.7/843/)
          Bring in line with other parsers with special InputStream requirements, by using TikaInputStream TIKA-1710 (nick: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1696862)

          Show
          hudson Hudson added a comment - SUCCESS: Integrated in tika-trunk-jdk1.7 #843 (See https://builds.apache.org/job/tika-trunk-jdk1.7/843/ ) Bring in line with other parsers with special InputStream requirements, by using TikaInputStream TIKA-1710 (nick: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1696862 ) /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/mail/RFC822Parser.java TIKA-1710 Guava is no longer required, we have StandardCharsets instead now (nick: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1696859 ) /tika/trunk/tika-parsers/pom.xml

            People

            • Assignee:
              Unassigned
              Reporter:
              kunda Yaniv Kunda
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development