Uploaded image for project: 'Sentry (Retired)'
  1. Sentry (Retired)
  2. SENTRY-2556

Provide prefer local option to improve performance when Hive on S3 is used conjunction with Sentry HA and Sentry-HDFS sync

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 2.0.0
    • Fix Version/s: None
    • Component/s: Sentry
    • Labels:
      None

      Description

      Performance degradation occurs when 1) the Hive Metastore Server is connected (via Sentry client) to a remote Sentry Server and 2) the HiveServer2 is connected (via Sentry client) to a local Sentry Server and when Hive on S3 is used conjunction with Sentry HA and Sentry-HDFS sync.

      TO REPRODUCE:

      1. Setup Sentry HA with HDFS sync
      2. Configure Hive and HDFS to use S3
      3. Create an external table in s3

      EXAMPLE: CREATE EXTERNAL TABLE mytesttable (firstname STRING, lastname STRING, address STRING, city STRING, state STRING, zip int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LOCATION 's3a://ajy-sentry/';

      RESULT: Creating a table in s3 can take a very long time (two orders of magnitude slower than table creation in HDFS). Note that it won't always occur (see below for more detail

      To force a test system into the condition that causes the performance degradation: 

      1. For each HiveServer2 instance, setting the sentry.service.client.server.rpc-addresses property to one value (local to the HiveServer2 instance) and then restarting that HiveServer2 instance
      2. For each HMS instance, setting the sentry.service.client.server.rpc-addresses property to one value (remote to the HMS instance) and then restarting that HMS instance

      -------------

      I think the needed code change would be to provide a prefer local option on the SentryTransportPool and/or the SentryGenericServiceClientDefaultImpl so that when the HMS is on the same node as one of the Sentry servers, that the local Sentry server is used. Testing would need to be performed to determine whether this should become normal behavior or should be user-configurable for specific situations

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              anthony.young-garner@cloudera.com Anthony Young-Garner
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: