Details
-
Umbrella
-
Status: Closed
-
Major
-
Resolution: Resolved
-
None
-
None
-
None
-
None
Description
There's a common pattern in HBase code where in the constructor, or in an initialization method also called once per instantiation, or both, we look up values from Hadoop Configuration and store them into fields. This can be expensive if the object is frequently created. Configuration is a heavyweight registry that does a lot of string operations and regex matching. See attached example. Method calls into Configuration account for 48.25% of CPU time when creating the HTable object in 0.98. (The remainder is spent instantiating the RPC controller via reflection, a separate issue that merits followup elsewhere.) Creation of HTable instances is expected to be a lightweight operation if a client is using unmanaged HConnections; however creating HTable instances takes up about 18% of the client's total on-CPU time. This is just one example where constructors that use Configuration may be harmful.