Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-14620

Preference checks in Autoscaling framework can go wrong




      In the Autoscaling framework, preferences are values to maximize or minimize when doing replica placement decisions and are used to guide the framework towards the optimal placement decision. The default cluster preferences are minimizing the number of cores on each node then maximizing the free disk space on each node. In order to evaluate a Preference, the corresponding parameter value for a node (for example number of cores or free disk space) must be known.

      Such parameters for a node (and their values) are stored in a Cell array in the Row object. Parameter names come from multiple sources (not 100% sure about the list below, hard to really trace legit values):

      • From Preference definitions: freedisk, cores, heapUsage and sysLoadAvg
      • From Policy definitions: withCollection, port, ip_1, ip_2, ip_3, ip_4, freedisk, nodeRole, cores, sysLoadAvg, heapUsage, host, node, nodeset, metrics:* (* meaning any string)
      • Possibly from other places...

      "cores" and "freedisk" are always added to the Cell array in Row (see Policy.DEFAULT_PARAMS_OF_INTEREST). The Cell array is sorted by natural ordering of the parameter names. This causes "cores" to be first and "freedisk" second, as you'll notice all other parameter names listed above are lexicographically greater than these two.

      When comparing rows in Preference.compare() (used for sorting them), the value of the Preference is obtained from the Cell with array index equal to the Preference index (starts at 0, in the order declared).
      This obviously only makes sense if the Cell array order is identical to Preference list order. Preferences therefore would have to be provided by increasing parameter name and no parameters should exist in the Cell array that are lexicographically smaller than the "highest" Preference without having a matching Preference.

      This basically means that when preferences are the default minimize number of cores first then maximize freedisk second, the check works. But if for example cluster preferences are explicitly defined to maximize freedisk first then minimize number of cores, the check is broken. This will be more apparent when parameters to maximize are swapped with parameters to minimize (which would be the case here).

      Unclear to me what's the real impact of this issue.




            Unassigned Unassigned
            ilan Ilan Ginzburg
            0 Vote for this issue
            1 Start watching this issue