XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.9.0-incubating
    • enhancer-0.10.0
    • Enhancer
    • None

    Description

      While adapting the TikaEngine and the MetaxaEngine to the new model ContentItemFactory pattern, i recognized that it is important to support streaming of content to a Blob. Because otherwise those kind of engine would need to temporary hold the whole transformed version of the content (e.g. the extract plain/text, xhtml, ...) before they could create a new Blob via one of the ContentItemFactory#createBlob(...) methods.

      The following extension to the ContentItemFactory will avoid this issue and allow to "stream" content to a Blob

      Added Method to the ContentItemFactory

      /** Creates a new ContentSink */
      + createContentSink(String mediaType) : ContentSink;

      and the new Interface ContentSink

      /** Getter for the OutputStream */
      + getOutputStream() : OutputStream;
      /** Getter for the Blob */
      + getBlob() : Blob;

      _Note:_ User MUST NOT parse the Blob of a ContentSink to any other components until all the data are written to the OutputStream, because this may cause that other components to read partial data when calling Blob#getStream(). This feature is intended to reduce the memory footprint and not to support concurrent writing and reading of data as supported by pipes.

      _Intended Usage:_

      This example shows a typical usage of a ContentSink within the processEnhancement(..) method of an EnhancementEngine

      ContentItem ci; //the content item to process
      ContentSink plainTextSink = contentItemFactory.createContentSink("text/plain");
      Writer writer = new OutputStreamWriter(plainTextSink.getOutputStream,"UTF-8");
      try

      { // parse the writer to the framework that extracts the text }

      finally

      { IOUtils.closeQuietly(writer); }

      //now add the Blob to the ContentItem
      UriRef textBlobUri; //create an UriRef for the Blob
      ci.addPart(textBlobUri, plainTextSink.getBlob());
      plainTextSink = null;

      Attachments

        Activity

          People

            rwesten Rupert Westenthaler
            rwesten Rupert Westenthaler
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: