Uploaded image for project: 'Kylin'
  1. Kylin
  2. KYLIN-4206

Build kylin on EMR 5.23. The kylin version is 2.6.4. When building the cube, the hive table cannot be found

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • v2.6.4
    • v3.1.0, v3.0.2, v2.6.6
    • Environment
    • None
    • EMR 5.23(hadoop 2.8.5\HBase 1.4.9\hive 2.3.4\Spark 2.4.0\Tez 0.9.1\HCatalog 2.3.4\Zookeeper 3.4.13)
      kylin 2.6.4

    Description

      hi,

         I  Build kylin on EMR 5.23. The kylin version is 2.6.4.When building the cube, the hive table cannot be found.The detailed error information is as follows:

      java.lang.RuntimeException: java.io.IOException: NoSuchObjectException(message:kylin_flat_db_test1.kylin_intermediate_kylin_sales_cube_4e93b31d_3be2_c9e8_55de_a9814f63c4ba table not found)java.lang.RuntimeException: java.io.IOException: NoSuchObjectException(message:kylin_flat_db_test1.kylin_intermediate_kylin_sales_cube_4e93b31d_3be2_c9e8_55de_a9814f63c4ba table not found) at org.apache.kylin.source.hive.HiveMRInput$HiveTableInputFormat.configureJob(HiveMRInput.java:83) at org.apache.kylin.engine.mr.steps.FactDistinctColumnsJob.setupMapper(FactDistinctColumnsJob.java:126) at org.apache.kylin.engine.mr.steps.FactDistinctColumnsJob.run(FactDistinctColumnsJob.java:104) at org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:131) at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:167) at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:71) at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:167) at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:114) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

      On the EMR, hive metadata is shared by glue, and the URL of Metastore is configured in hive-site.xml.

      <name>hive.metastore.uris</name>
      <value>thrift://ip-172-40-15-164.ec2.internal:9083</value>
      <description>JDBC connect string for a JDBC metastore</description>
      </property>

      <property>
      <name>hive.metastore.client.factory.class</name>
      <value>com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory</value>
      </property>

      But when I use hive's own metadata, that is, don't use glue to share metadata, the above exception will not occur, comment out the following configuration.
      <!--<property>
      <name>hive.metastore.client.factory.class</name>
      <value>com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory</value>
      </property>

      -->
      But since EMR uses shared metadata, if you don't use metadata sharing, then I can't query other hive tables built by the cluster.

      The configuration file is detailed in the attachment. Please help me solve this problem.Thank you。

      Best regard.

      Note:

      For anyone who interested in Glue support, https://issues.apache.org/jira/browse/KYLIN-3685?focusedCommentId=17002995&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17002995 is also another verifed workaroud. You may check kaige's comment in link.

      Attachments

        1. kylin_hive_conf.xml
          4 kB
          rongneng.wei
        2. kylin_job_conf.xml
          3 kB
          rongneng.wei
        3. kylin.properties
          14 kB
          rongneng.wei

        Issue Links

          Activity

            People

              rongneng.wei rongneng.wei
              rongneng.wei rongneng.wei
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: