Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-4389

Enable partial updates Elasticsearch

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.4.0
    • Fix Version/s: 2.5.0
    • Component/s: io-java-elasticsearch
    • Labels:
      None

      Description

      Expose a configuration option on the ElasticsearchIO to enable partial updates rather than full document inserts.

      Rationale: We have the case where different pipelines process different categories of information of the target entity (e.g. one for taxonomic processing, another for geospatial processing). A read and merge is not possible inside the batch call, meaning the only way to do it is through a join. The join approach is slow, and also stops the ability to run a single process in isolation (e.g. reprocess the geospatial component of all docs).

      Use of this configuration parameter has to be used in conjunction with controlling the document ID (possible since BEAM-3201) to make sense.

      The client API would include a withUseUpdate(...) such as:

      source.apply(
        ElasticsearchIO.write()
          .withConnectionConfiguration(connectionConfiguration)
          .withIdFn(new ExtractValueFn("id"))
          .withUseUpdate(true)
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                timrobertson100 Tim Robertson
                Reporter:
                timrobertson100 Tim Robertson
              • Votes:
                1 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2h
                  2h