[RANGER-3987] Potential risk of OOM - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Critical
Resolution: Unresolved
Affects Version/s: 2.2.0
Fix Version/s: None
Component/s: admin
Labels:
None

Description

During every policy-loading process of other components, the attribute "LastActivationTimeInMillis" is always set to System.currentTimeMillis(). See loadPolicy():

// from PolicyRefresher.java loadPolicy()

//load policy from PolicyAdmin
ServicePolicies svcPolicies = loadPolicyfromPolicyAdmin();

if (svcPolicies == null) {
   //if Policy fetch from Policy Admin Fails, load from cache
   if (!policiesSetInPlugin) {
      svcPolicies = loadFromCache();
   }
}

if (PERF_POLICYENGINE_INIT_LOG.isDebugEnabled()) {
   long freeMemory = Runtime.getRuntime().freeMemory();
   long totalMemory = Runtime.getRuntime().totalMemory();
   PERF_POLICYENGINE_INIT_LOG.debug("In-Use memory: " + (totalMemory - freeMemory) + ", Free memory:" + freeMemory);
}

if (svcPolicies != null) {
   plugIn.setPolicies(svcPolicies);
   policiesSetInPlugin = true;
   serviceDefSetInPlugin = false;
   setLastActivationTimeInMillis(System.currentTimeMillis()); // always updated during each policy loading
   lastKnownVersion = svcPolicies.getPolicyVersion() != null ? svcPolicies.getPolicyVersion() : -1L;
} else {
   if (!policiesSetInPlugin && !serviceDefSetInPlugin) {
      plugIn.setPolicies(null);
      serviceDefSetInPlugin = true;
   }
}

In this case, the column "info" from table "x_plugin_info" would always need to be updated since it is a json string containing activationTime. See doCreateOrUpdateXXPluginInfo():

// from AssetMgr, doCreateOrUpdateXXPluginInfo().
if (lastPolicyActivationTime != null && lastPolicyActivationTime > 0 && (dbObj.getPolicyActivationTime() == null || !dbObj.getPolicyActivationTime().equals(lastPolicyActivationTime))) {
   dbObj.setPolicyActivationTime(lastPolicyActivationTime);
   needsUpdating = true;
}

Since doCreateOrUpdateXXPluginInfo() is a Runnble committed to RangerTransactionService. (RangerTransactionSynchronizationAdapter in Ranger 2.3.0 though, the risk might still be there). Also see doCreateOrUpdateXXPluginInfo():

// code placeholder
commitWork = new Runnable() {
   @Override
   public void run() {
      doCreateOrUpdateXXPluginInfo(pluginInfo, entityType, isTagVersionResetNeeded, clusterName);
   }
}; 
...
activityLogger.commitAfterTransactionComplete(commitWork);

RangerTransactionService use a thread pool with unlimited work queue, ScheduledExecutorService, to store extra Runnables.

In our cases, there are 1000+ hive and hbase instances, the ranger admin seems to be under tremendous pressure becuase every instance would periodically request policy-downloading API and trigger an update of the table "x_plugin_info". Since the core thread pool seems to be poor and DB is also likely under pressure, the work queue is stacking, leaking out JVM Heap and causing OOM finally.

I think adding more core threads would help, but when the system grow, this part of code would bring a lot overhead, is there any solution?

Attachments

Activity

People

Assignee:: KyrieG

Reporter:: KyrieG

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 25/Nov/22 16:39

Updated:: 06/Mar/23 23:57