Details
-
Sub-task
-
Status: Closed
-
Major
-
Resolution: Fixed
-
0.9.0-incubating
-
None
Description
While adapting the TikaEngine and the MetaxaEngine to the new model ContentItemFactory pattern, i recognized that it is important to support streaming of content to a Blob. Because otherwise those kind of engine would need to temporary hold the whole transformed version of the content (e.g. the extract plain/text, xhtml, ...) before they could create a new Blob via one of the ContentItemFactory#createBlob(...) methods.
The following extension to the ContentItemFactory will avoid this issue and allow to "stream" content to a Blob
Added Method to the ContentItemFactory
/** Creates a new ContentSink */
+ createContentSink(String mediaType) : ContentSink;
and the new Interface ContentSink
/** Getter for the OutputStream */
+ getOutputStream() : OutputStream;
/** Getter for the Blob */
+ getBlob() : Blob;
_Note:_ User MUST NOT parse the Blob of a ContentSink to any other components until all the data are written to the OutputStream, because this may cause that other components to read partial data when calling Blob#getStream(). This feature is intended to reduce the memory footprint and not to support concurrent writing and reading of data as supported by pipes.
_Intended Usage:_
This example shows a typical usage of a ContentSink within the processEnhancement(..) method of an EnhancementEngine
ContentItem ci; //the content item to process
ContentSink plainTextSink = contentItemFactory.createContentSink("text/plain");
Writer writer = new OutputStreamWriter(plainTextSink.getOutputStream,"UTF-8");
try
finally
{ IOUtils.closeQuietly(writer); } //now add the Blob to the ContentItem
UriRef textBlobUri; //create an UriRef for the Blob
ci.addPart(textBlobUri, plainTextSink.getBlob());
plainTextSink = null;