Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-26634

[Hive][Spark] EntityNotFoundException ,Database global_temp not found, when connecting hive metastore to aws glue.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Cannot Reproduce
    • None
    • Not Applicable
    • None
    • None
    • Important

    Description

      while running our batches using Apache Spark with Hive on EMR cluster, as we're using AWS glue as a MetaStore, it seems there is an issue occurs, which is 

      EntityNotFoundException ,Database global_temp not found 
      2022-10-09T10:36:31,262 INFO  [573c4ce0-f73c-439b-829d-1f0b25db45ec main([])]: ql.Driver (:()) - Completed compiling command(queryId=hadoop_20221009103631_214e4b6c-b0f2-496e-b9a8-86831b202736); Time taken: 0.02 seconds
      2022-10-09T10:36:31,262 INFO  [573c4ce0-f73c-439b-829d-1f0b25db45ec main([])]: reexec.ReExecDriver (:()) - Execution #1 of query
      2022-10-09T10:36:31,262 INFO  [573c4ce0-f73c-439b-829d-1f0b25db45ec main([])]: ql.Driver (:()) - Concurrency mode is disabled, not creating a lock manager
      2022-10-09T10:36:31,262 INFO  [573c4ce0-f73c-439b-829d-1f0b25db45ec main([])]: ql.Driver (:()) - Executing command(queryId=hadoop_20221009103631_214e4b6c-b0f2-496e-b9a8-86831b202736): show views
      2022-10-09T10:36:31,263 INFO  [573c4ce0-f73c-439b-829d-1f0b25db45ec main([])]: ql.Driver (:()) - Starting task [Stage-0:DDL] in serial mode
      2022-10-09T10:36:32,270 INFO  [573c4ce0-f73c-439b-829d-1f0b25db45ec main([])]: ql.Driver (:()) - Completed executing command(queryId=hadoop_20221009103631_214e4b6c-b0f2-496e-b9a8-86831b202736); Time taken: 1.008 seconds
      2022-10-09T10:36:32,270 INFO  [573c4ce0-f73c-439b-829d-1f0b25db45ec main([])]: ql.Driver (:()) - OK
      2022-10-09T10:36:32,270 INFO  [573c4ce0-f73c-439b-829d-1f0b25db45ec main([])]: ql.Driver (:()) - Concurrency mode is disabled, not creating a lock manager
      2022-10-09T10:36:32,271 INFO  [573c4ce0-f73c-439b-829d-1f0b25db45ec main([])]: exec.ListSinkOperator (:()) - RECORDS_OUT_INTERMEDIATE:0, RECORDS_OUT_OPERATOR_LIST_SINK_0:0,
      2022-10-09T10:36:32,271 INFO  [573c4ce0-f73c-439b-829d-1f0b25db45ec main([])]: CliDriver (:()) - Time taken: 1.028 seconds
      2022-10-09T10:36:32,271 INFO  [573c4ce0-f73c-439b-829d-1f0b25db45ec main([])]: conf.HiveConf (HiveConf.java:getLogIdVar(5104)) - Using the default value passed in for log id: 573c4ce0-f73c-439b-829d-1f0b25db45ec
      2022-10-09T10:36:32,272 INFO  [573c4ce0-f73c-439b-829d-1f0b25db45ec main([])]: session.SessionState (SessionState.java:resetThreadName(452)) - Resetting thread name to  main
      2022-10-09T10:36:46,512 INFO  [main([])]: conf.HiveConf (HiveConf.java:getLogIdVar(5104)) - Using the default value passed in for log id: 573c4ce0-f73c-439b-829d-1f0b25db45ec
      2022-10-09T10:36:46,513 INFO  [main([])]: session.SessionState (SessionState.java:updateThreadName(441)) - Updating thread name to 573c4ce0-f73c-439b-829d-1f0b25db45ec main
      2022-10-09T10:36:46,515 INFO  [573c4ce0-f73c-439b-829d-1f0b25db45ec main([])]: ql.Driver (:()) - Compiling command(queryId=hadoop_20221009103646_f390a868-07d7-49f1-b620-70d40e5e2cff): use global_temp
      2022-10-09T10:36:46,530 INFO  [573c4ce0-f73c-439b-829d-1f0b25db45ec main([])]: ql.Driver (:()) - Concurrency mode is disabled, not creating a lock manager
      2022-10-09T10:36:46,666 ERROR [573c4ce0-f73c-439b-829d-1f0b25db45ec main([])]: ql.Driver (:()) - FAILED: SemanticException [Error 10072]: Database does not exist: global_temp
      org.apache.hadoop.hive.ql.parse.SemanticException: Database does not exist: global_temp
              at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.getDatabase(BaseSemanticAnalyzer.java:2171)
              at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeSwitchDatabase(DDLSemanticAnalyzer.java:1413)
              at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:516)
              at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:285)
              at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:659)
              at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1826)
              at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1773)
              at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1768)
              at org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:126)
              at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:214)
              at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:239)
              at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188)
              at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:402)
              at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821)
              at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
              at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683)
              at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
              at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
              at java.lang.reflect.Method.invoke(Method.java:498)
              at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
              at org.apache.hadoop.util.RunJar.main(RunJar.java:236) 

      global_temp is a system preserved db by spark session to hold the global temp views.
      this db is not created on our AWS glue, as creating this on glue will fail all our EMR jobs with this error

      ERROR ApplicationMaster: User class threw exception: org.apache.spark.SparkException: global_temp is a system preserved database, please rename your existing database to resolve the name conflict, or set a different value for spark.sql.globalTempDatabase, and launch your Spark application again. 

      We're not creating or using any global temp views in our project, but it seems this is a health check happen when initializing spark session by spark it self.

      EMR configuration used 

      // [
         {
            "Classification":"hive-site",
            "Properties":{
               "hive.msck.path.validation":"ignore",
               "hive.exec.max.dynamic.partitions":"1000000",
               "hive.vectorized.execution.enabled":"true",
               "hive.metastore.client.factory.class":"com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory",
               "hive.exec.dynamic.partition.mode":"nonstrict",
               "hive.exec.max.dynamic.partitions.pernode":"500000"
            },
            "Configurations":[
               
            ]
         },
         {
            "Classification":"yarn-site",
            "Properties":{
               "yarn.resourcemanager.scheduler.class":"org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler",
               "yarn.log-aggregation.retain-seconds":"-1",
               "yarn.scheduler.fair.allow-undeclared-pools":"true",
               "yarn.log-aggregation-enable":"true",
               "yarn.scheduler.fair.user-as-default-queue":"true",
               "yarn.nodemanager.remote-app-log-dir":"LOGS_PATH",
               "yarn.scheduler.fair.preemption":"true",
               "yarn.scheduler.fair.preemption.cluster-utilization-threshold":"0.8",
               "yarn.resourcemanager.am.max-attempts":"10"
            },
            "Configurations":[
               
            ]
         },
         {
            "Classification":"mapred-site",
            "Properties":{
               "mapred.jobtracker.taskScheduler":"org.apache.hadoop.mapred.FairScheduler"
            },
            "Configurations":[
               
            ]
         },
         {
            "Classification":"presto-connector-hive",
            "Properties":{
               "hive.recursive-directories":"true",
               "hive.metastore.glue.datacatalog.enabled":"true"
            },
            "Configurations":[
               
            ]
         },
         {
            "Classification":"spark-log4j",
            "Properties":{
               "log4j.logger.com.project":"DEBUG",
               "log4j.appender.rolling.layout":"org.apache.log4j.PatternLayout",
               "log4j.logger.org.apache.spark":"WARN",
               "log4j.appender.rolling.encoding":"UTF-8",
               "log4j.appender.rolling.layout.ConversionPattern":"%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n",
               "log4j.appender.rolling.maxBackupIndex":"5",
               "log4j.appender.rolling":"org.apache.log4j.RollingFileAppender",
               "log4j.rootLogger":"WARN, rolling",
               "log4j.logger.org.eclipse.jetty":"WARN",
               "log4j.appender.rolling.maxFileSize":"1000MB",
               "log4j.appender.rolling.file":"${spark.yarn.app.container.log.dir}/spark.log"
            },
            "Configurations":[
               
            ]
         },
         {
            "Classification":"emrfs-site",
            "Properties":{
               "fs.s3.maxConnections":"10000"
            },
            "Configurations":[
               
            ]
         },
         {
            "Classification":"spark-hive-site",
            "Properties":{
               "hive.metastore.client.factory.class":"com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory"
            },
            "Configurations":[
               
            ]
         }
      ] 

      and the spark submit command is

       spark-submit --deploy-mode cluster --master yarn --conf spark.yarn.appMasterEnv.ENV=DEV --conf spark.executorEnv.ENV=DEV  --conf spark.network.timeout=6000s --conf spark.sql.catalogImplementation=hive --conf spark.driver.memory=15g --conf spark.hadoop.hive.metastore.client.factory.class=com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory --class CLASS_NAME JAR_FILE_PATH
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            mahmoodabuawwad Mahmood Abu Awwad
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: