Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-10319

Hive CLI startup takes a long time with a large number of databases

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.0.0
    • 1.3.0, 2.0.0
    • CLI
    • None
    • Reviewed

    Description

      The Hive CLI takes a long time to start when there is a large number of databases in the DW. I think the root cause is the way permanent UDFs are loaded from the metastore. When I looked at the logs and the source code I see that at startup Hive first gets all the databases from the metastore and then for each database it makes a metastore call to get the permanent functions for that database see Hive.java . So the number of metastore calls made is in the order of the number of databases. In production we have several hundreds of databases so Hive makes several hundreds of RPC calls during startup, taking 30+ seconds.

      Attachments

        1. HIVE-10319.patch
          6.46 MB
          Nezih Yigitbasi
        2. HIVE-10319.6.patch
          805 kB
          Nezih Yigitbasi
        3. HIVE-10319.5.patch
          699 kB
          Nezih Yigitbasi
        4. HIVE-10319.4.patch
          685 kB
          Nezih Yigitbasi
        5. HIVE-10319.3.patch
          7.41 MB
          Nezih Yigitbasi
        6. HIVE-10319.2.patch
          6.47 MB
          Nezih Yigitbasi
        7. HIVE-10319.1.patch
          6.46 MB
          Nezih Yigitbasi

        Issue Links

          Activity

            People

              nezihyigitbasi Nezih Yigitbasi
              nezihyigitbasi Nezih Yigitbasi
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: