Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-21206

Bootstrap replication is slow as it opens lot of metastore connections.

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      Hive bootstrap replication of 1TB data onprem to onprem in Hive3 is running slower compared to Hive2.

      Time taken for bootstrap replication of table with 1000 partitions are as below:

      Hive2- Hive2 Hive3 - Hive3
      Bootstrap: 7m BootStrap: 17m

      Every MoveTask is closing and opening new metastore connection which is causing slow down.

      2019-02-08T12:28:30,174 INFO  [HiveServer2-Background-Pool: Thread-1134]: ql.Driver (:()) - Starting task [Stage-5:MOVE] in serial mode
      2019-02-08T12:28:30,177 INFO  [HiveServer2-Background-Pool: Thread-1134]: exec.Task (:()) - Loading data to table nondefault.nondefault_table1 from hdfs://mycluster1/warehouse/tablespace/managed/hive/nondefault.db/nondefault_table1/.hive-staging_hive_2019-02-08_12-28-23_584_1482331698286040936-3/-ext-10001
      2019-02-08T12:28:30,189 INFO  [HiveServer2-Background-Pool: Thread-1134]: metastore.HiveMetaStoreClient (:()) - Trying to connect to metastore with URI thrift://ctr-e139-1542663976389-62755-01-000014.hwx.site:9083
      2019-02-08T12:28:30,189 INFO  [HiveServer2-Background-Pool: Thread-1134]: metastore.HiveMetaStoreClient (:()) - HMSC::open(): Could not find delegation token. Creating KERBEROS-based thrift connection.
      2019-02-08T12:28:30,206 INFO  [HiveServer2-Background-Pool: Thread-1134]: metastore.HiveMetaStoreClient (:()) - Opened a connection to metastore, current connections: 4
      2019-02-08T12:28:30,206 INFO  [HiveServer2-Background-Pool: Thread-1134]: metastore.HiveMetaStoreClient (:()) - Connected to metastore.
      2019-02-08T12:28:30,206 INFO  [HiveServer2-Background-Pool: Thread-1134]: metastore.RetryingMetaStoreClient (:()) - RetryingMetaStoreClient proxy=class org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient ugi=hive/ctr-e139-1542663976389-62755-01-000014.hwx.site@HWQE.HORTONWORKS.COM (auth:KERBEROS) retries=24 delay=5 lifetime=0
      2019-02-08T12:28:30,325 INFO  [org.apache.ranger.audit.queue.AuditBatchQueue1]: provider.BaseAuditHandler (:()) - Audit Status Log: name=hiveServer2.async.multi_dest.batch, finalDestination=hiveServer2.async.multi_dest.batch.solr, interval=01:00.002 minutes, events=2, succcessCount=1, totalEvents=56, totalSuccessCount=25
      2019-02-08T12:28:30,520 INFO  [HiveServer2-Background-Pool: Thread-1134]: common.FileUtils (FileUtils.java:mkdir(580)) - Creating directory if it doesn't exist: hdfs://mycluster1/warehouse/tablespace/managed/hive/nondefault.db/nondefault_table1/base_0000001
      2019-02-08T12:28:31,245 INFO  [HiveServer2-Background-Pool: Thread-1134]: ql.Driver (:()) - Starting task [Stage-11:MOVE] in serial mode
      2019-02-08T12:28:31,245 INFO  [HiveServer2-Background-Pool: Thread-1134]: metastore.HiveMetaStoreClient (:()) - Closed a connection to metastore, current connections: 3
      2019-02-08T12:28:31,246 INFO  [HiveServer2-Background-Pool: Thread-1134]: exec.Task (:()) - Loading data to table nondefault.nondefault_table2 from hdfs://mycluster1/warehouse/tablespace/managed/hive/nondefault.db/nondefault_table2/.hive-staging_hive_2019-02-08_12-28-23_810_7457138692783022870-3/-ext-10002
      2019-02-08T12:28:31,327 INFO  [HiveServer2-Background-Pool: Thread-1134]: metastore.HiveMetaStoreClient (:()) - Trying to connect to metastore with URI thrift://ctr-e139-1542663976389-62755-01-000014.hwx.site:9083
      2019-02-08T12:28:31,327 INFO  [HiveServer2-Background-Pool: Thread-1134]: metastore.HiveMetaStoreClient (:()) - HMSC::open(): Could not find delegation token. Creating KERBEROS-based thrift connection.
      2019-02-08T12:28:31,336 INFO  [HiveServer2-Background-Pool: Thread-1134]: metastore.HiveMetaStoreClient (:()) - Opened a connection to metastore, current connections: 4
      2019-02-08T12:28:31,337 INFO  [HiveServer2-Background-Pool: Thread-1134]: metastore.HiveMetaStoreClient (:()) - Connected to metastore.
      2019-02-08T12:28:31,337 INFO  [HiveServer2-Background-Pool: Thread-1134]: metastore.RetryingMetaStoreClient (:()) - RetryingMetaStoreClient proxy=class org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient ugi=hive/ctr-e139-1542663976389-62755-01-000014.hwx.site@HWQE.HORTONWORKS.COM (auth:KERBEROS) retries=24 delay=5 lifetime=0
      2019-02-08T12:28:31,642 INFO  [HiveServer2-Background-Pool: Thread-1134]: common.FileUtils (FileUtils.java:mkdir(580)) - Creating directory if it doesn't exist: hdfs://mycluster1/warehouse/tablespace/managed/hive/nondefault.db/nondefault_table2/base_0000001
      

      Attachments

        1. HIVE-21206.03.patch
          5 kB
          Sankar Hariappan
        2. HIVE-21206.02.patch
          4 kB
          Sankar Hariappan
        3. HIVE-21206.01.patch
          2 kB
          Sankar Hariappan

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            sankarh Sankar Hariappan Assign to me
            sankarh Sankar Hariappan
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - Not Specified
              Not Specified
              Remaining:
              Remaining Estimate - 0h
              0h
              Logged:
              Time Spent - 20m
              20m

              Slack

                Issue deployment