Details
-
Improvement
-
Status: Open
-
P3
-
Resolution: Unresolved
-
None
-
None
-
None
Description
The recent refactor of HBase.ReadAll in BEAM-9279 creates new connections in the @ProcessElement method (once per element), in the case that a pipeline is used on streaming mode this could be costly so we should find a way to cache and reuse connections to avoid both slow start of reads and saturating the clusters.
Notice that this is an ongoing issue for DoFn based IOs that manifested first on Writes for JdbcIO BEAM-7230 and was recently discussed too in the context of the CassandraIO refactor: https://github.com/apache/beam/pull/10546#issuecomment-580619044