Uploaded image for project: 'Kylin'
  1. Kylin
  2. KYLIN-5044

Kylin 3.1.2 - Cube processing fails on Step 4 when hive client is switched to beeline.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Not A Problem
    • v3.1.2
    • None
    • Job Engine
    • None
    • HDP-2.6.5.0
      (2.6.5.0-292)
      Kylin 3.1.2

      Centos 7
    • Newcomer (Easy) - Everyone can do this

    Description

      I switched hive client to beeline and also enabled sparksql for hive source using settings below in kylin.properties:

      kylin.source.hive.client=beeline
      kylin.source.hive.beeline-shell=beeline
      kylin.source.hive.beeline-params=-n hive -u jdbc:hive2://hdp...:10016
      kylin.source.hive.enable-sparksql-for-table-ops=true
      kylin.source.hive.sparksql-beeline-shell=beeline

      This caused Stage 4 (#4 Step Name: Build Dimension Dictionary) of cube processing to fail, as well as lookup refresh. The error:

      org.apache.kylin.engine.mr.exception.HadoopShellException: java.io.IOException: java.lang.NullPointerExceptionorg.apache.kylin.engine.mr.exception.HadoopShellException: java.io.IOException: java.lang.NullPointerException at org.apache.kylin.source.hive.HiveTable.getSignature(HiveTable.java:78) at org.apache.kylin.dict.lookup.SnapshotTable.<init>(SnapshotTable.java:73) at org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:140) at org.apache.kylin.cube.CubeManager$DictionaryAssist.buildSnapshotTable(CubeManager.java:1260) at org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:1164) at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:123) at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:69) at org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:73) at org.apache.kylin.engine.mr.MRUtil.runMRJob(MRUtil.java:93) at org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:64) at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:180) at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:72) at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:180) at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:118) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)Caused by: java.lang.NullPointerException at org.apache.kylin.common.util.HadoopUtil.fixWindowsPath(HadoopUtil.java:122) at org.apache.kylin.common.util.HadoopUtil.makeURI(HadoopUtil.java:114) at org.apache.kylin.common.util.HadoopUtil.getFileSystem(HadoopUtil.java:92) at org.apache.kylin.engine.mr.DFSFileTable.getSizeAndLastModified(DFSFileTable.java:90) at org.apache.kylin.source.hive.HiveTable.getSignature(HiveTable.java:63) ... 16 more
      result code:2 at org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:74) at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:180) at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:72) at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:180) at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:118) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 

      Investigation showed that my beeline returns metadata in different format than the one used in code in BeelineHiveClient.java, basically sdLocation was not retrieved leading to NPE. This happens on LEARN_KYLIN sample project. I wonder if that something to do with my environment or it's simply a bug (seems severe).

      Proposed code changes: https://github.com/apache/kylin/pull/1698/files that resolved the issue for me locally (needs to be rebased with proper KYLIN jira id in commit, and targetted to other branch, ideally it should be 3.1.3) .

      Attachments

        1. 2021-07-26 21_56_39-Window.png
          66 kB
          Piotr Naszarkowski

        Activity

          People

            Unassigned Unassigned
            pnaszarkowski Piotr Naszarkowski
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: