Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-8306

improve estimation of data byte size reading from source in ElasticsearchIO

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.14.0
    • Fix Version/s: 2.17.0
    • Component/s: io-java-elasticsearch
    • Labels:
      None

      Description

      ElasticsearchIO splits BoundedSource based on the Elasticsearch index size. We expect it can be more accurate to split it base on query result size.

      Currently, we have a big Elasticsearch index. But for query result, it only contains a few documents in the index.  ElasticsearchIO splits it into up to1024 BoundedSources in Google dataflow. It takes long time to finish the processing the small numbers of Elasticsearch document in Google dataflow.

       

       

        Attachments

          Activity

            People

            • Assignee:
              derek.he Derek He
              Reporter:
              derek.he Derek He
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 4h 10m
                4h 10m