Affects Version/s: 2.1.0
Fix Version/s: None
There are 2 issues with the current patch - PatchForPrestoToSupportPresto333_J10038.java
- The patch is not working as expected, i.e. it does not update the presto service definition schema in the database to the latest version
- Although the patch does not work, it still return successfully and the Ranger patching subsystem thinks that it went successfully and updates the status to 'Y' in x_db_version_h. This is a logical error as it the patch should return false and thus signal that there is a problem. Although an exception is thrown in $RANGER_ADMIN_HOME/ews/logs/ranger_db_patch.log it is generic and just says that there was an error thrown but the stacktrace does not tell us what the cause of the error is.
I will explain the cause of issue 1 in the lines below. Regarding issue 2, this looks like it is a systematic problem related to all Ranger Java patches. The java patches all have similar structure:
- if there is an error thrown Runtime Exception
- catch all exceptions (including the one above) and log an error message
Since we are catching our own exception and just logging it, the Ranger patch subsystem thinks that the patch went through and it updates the version table x_db_version_h and marks the patch as applied REGARDLESS of whether it was applied or not. A poorly written patch will just pass as well as a very well written patch and both will be recorded as 'Y' in the x_db_version_h table which means the patch was applied. I can't comment on why this was decided to be so and why every patch contributor followed suit.
Regarding issue 1:
Adding a simple
in the try/catch block shows that the error is thrown by the RangerServiceDefValidator class
Looking the validator RangerServiceDefValidator it is clear that there is specific check if the ACTION_TYPE is UPDATE to check if we are updating the access_type name or id with this update. If we are trying to do that an exception is throown and validation fails. Looks at the git commit history, it shows that this commit added this validation - f2e148abbe, which makes sense.
RANGER-2218: Added validations for names duing service def updates
Now to understand why we want to update the access_type names and/or ids we need to check the evolution of the presto service definition.
|43757e7987||RANGER-2395||Add Presto plugin, This implements a plugin for Presto, a distributed SQL engine.|
|a4d1bed527||RANGER-2502||Correct service definition for presto|
|a15e49fb53||RANGER-2754||upgrade presto plugin to support row-filtering and column-masking and for changes in 317|
|454537a954||RANGER-2826||updated Presto plugin to support PrestoSQL version 333|
This table lists the commits that touched the presto service definition. I noticed that ranger service definition are usually updated with a backwards compatibility, i.e. when we add new resources/access_types we append them to the schema with an ever increasing ids, when we delete an existing resource/access_types we remove it from the schema and DON'T reuse it's id ever again. If we need to update a resource/access_type we perform to actions - deleta and then add/insert (so we are not actually updating the same entity). This ensures that schema evolves naturally and is kind of backwards compatible. It also ensures that current policies are not disrupted during a schema update as if item ids or names change after an upgrade it will invalidate the policies they are associated to.
There were 2 updates to the presto service definition that did not follow this schema evolution.
RANGER-2502/a4d1bed527 updated the access_type name from update to insert.
RANGER-2826/454537a954 removed/reordered/replace 4 access_types in the presto service definition - use,delete,alter,grant (which caused the RangerServiceDefValidator to throw an error)
Looking at some existing policy definitions in our cluster, the access_types are defined by name rather than by id, so updating the item id order in x_access_type_def table should not break any existing policies. Our current approach was to comment
which effectively means there was no service def validation run during the patch but it also meant that the patch got applied. A better approach will be to add inside the PatchForPrestoToSupportPresto333_J10038.java file a version of RangerServiceDefValidator class with a stripped down check on access_type name/id check, so we can perform a basic validation against the patch before applying.