Details
-
Task
-
Status: Resolved
-
P2
-
Resolution: Fixed
-
2.2.0
-
None
Description
TikaIO is currently implemented as a BoundedSource and asynchronous BoundedReader returning individual document's text chunks as Strings, eventually passed unordered (and not linked to the original documents) to the pipeline functions.
It was decided in the recent beam-dev thread that initially TikaIO should support the cases where only a single composite bean per file, capturing the file content, location (or name) and metadata, should flow to the pipeline, and thus avoiding the need to implement TikaIO as a BoundedSource/Reader.
Enhancing TikaIO to support the streaming of the content into the pipelines may be considered in the next phase, based on the specific use-cases...
Attachments
Issue Links
- links to