Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-9554

Improve connection reuse on HBaseIO.ReadAll

Details

    • Improvement
    • Status: Open
    • P3
    • Resolution: Unresolved
    • None
    • None
    • io-java-hbase
    • None

    Description

      The recent refactor of HBase.ReadAll in BEAM-9279 creates new connections in the @ProcessElement method (once per element), in the case that a pipeline is used on streaming mode this could be costly so we should find a way to cache and reuse connections to avoid both slow start of reads and saturating the clusters.

      Notice that this is an ongoing issue for DoFn based IOs that manifested first on Writes for JdbcIO BEAM-7230 and was recently discussed too in the context of the CassandraIO refactor: https://github.com/apache/beam/pull/10546#issuecomment-580619044

      Attachments

        Activity

          People

            Unassigned Unassigned
            iemejia Ismaël Mejía
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: