Details

    • Task
    • Status: Resolved
    • P2
    • Resolution: Fixed
    • 2.2.0
    • 2.3.0
    • io-java-tika
    • None

    Description

      TikaIO is currently implemented as a BoundedSource and asynchronous BoundedReader returning individual document's text chunks as Strings, eventually passed unordered (and not linked to the original documents) to the pipeline functions.

      It was decided in the recent beam-dev thread that initially TikaIO should support the cases where only a single composite bean per file, capturing the file content, location (or name) and metadata, should flow to the pipeline, and thus avoiding the need to implement TikaIO as a BoundedSource/Reader.

      Enhancing TikaIO to support the streaming of the content into the pipelines may be considered in the next phase, based on the specific use-cases...

      Attachments

        Issue Links

          Activity

            People

              sergey_beryozkin Sergey Beryozkin
              sergey_beryozkin Sergey Beryozkin
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Slack

                  Issue deployment