Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-8306

improve estimation of data byte size reading from source in ElasticsearchIO

Details

    • Improvement
    • Status: Triage Needed
    • P2
    • Resolution: Fixed
    • 2.14.0
    • 2.17.0
    • io-java-elasticsearch
    • None

    Description

      ElasticsearchIO splits BoundedSource based on the Elasticsearch index size. We expect it can be more accurate to split it base on query result size.

      Currently, we have a big Elasticsearch index. But for query result, it only contains a few documents in the index.  ElasticsearchIO splits it into up to1024 BoundedSources in Google dataflow. It takes long time to finish the processing the small numbers of Elasticsearch document in Google dataflow.

       

       

      Attachments

        Activity

          People

            derek.he Derek He
            derek.he Derek He
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 4h 10m
                4h 10m

                Slack

                  Issue deployment