Uploaded image for project: 'Ranger'
  1. Ranger
  2. RANGER-4741

Hive plugin optimization to avoid excessive metastore API calls

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 3.0.0, 2.5.0
    • plugins
    • None

    Description

      Authorizing access to tables with large number of columns can take a long time, as shown below. Time taken to for a table with 400 columns takes about 100 seconds.

      0: jdbc:hive2://localhost:10000> SELECT * FROM large_tbl_4000;
      ...
      No rows selected (98.674 seconds)
      
      
      0: jdbc:hive2://localhost:10000> SELECT * FROM large_tbl_1000;
      ...
      No rows selected (10.4 seconds)
      

       

      For each column referenced in the query, Ranger Hive authorizer calls metastore API to obtain owner of the table. Optimizing to call the metastore API once per table can significantly reduce the time taken to authorize queries.

      Here is the time taken to query the same tables with the Ranger Hive authorizer optimized to call metastore API only once per table referenced in the query:

      0: jdbc:hive2://localhost:10000> SELECT * FROM large_tbl_4000;
      ...
      No rows selected (1.328 seconds)
      
      
      0: jdbc:hive2://localhost:10000> SELECT * FROM large_tbl_1000;
      ...
      No rows selected (0.194 seconds)
      

      Attachments

        1. RANGER-4741.patch
          15 kB
          Madhan Neethiraj

        Activity

          People

            madhan Madhan Neethiraj
            madhan Madhan Neethiraj
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: