Description
Authorizing access to tables with large number of columns can take a long time, as shown below. Time taken to for a table with 400 columns takes about 100 seconds.
0: jdbc:hive2://localhost:10000> SELECT * FROM large_tbl_4000; ... No rows selected (98.674 seconds) 0: jdbc:hive2://localhost:10000> SELECT * FROM large_tbl_1000; ... No rows selected (10.4 seconds)
For each column referenced in the query, Ranger Hive authorizer calls metastore API to obtain owner of the table. Optimizing to call the metastore API once per table can significantly reduce the time taken to authorize queries.
Here is the time taken to query the same tables with the Ranger Hive authorizer optimized to call metastore API only once per table referenced in the query:
0: jdbc:hive2://localhost:10000> SELECT * FROM large_tbl_4000; ... No rows selected (1.328 seconds) 0: jdbc:hive2://localhost:10000> SELECT * FROM large_tbl_1000; ... No rows selected (0.194 seconds)