Uploaded image for project: 'Kylin'
  1. Kylin
  2. KYLIN-1985

SnapshotTable should only keep the columns described in tableDesc

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: v1.5.3
    • Fix Version/s: v1.5.4
    • Component/s: Job Engine
    • Labels:
      None

      Description

      we suffered from a strange problem that we got a java.lang.ArrayIndexOutOfBoundsException when build of refresh a cube, exception stack like this :

      java.lang.IllegalStateException: Failed to load lookup table DIM_TABLE_NAME from snapshot /table_snapshot/dim_table_name/5a78a522-6f85-4650-b47d-6a5f5806b7f7.snapshot
      at org.apache.kylin.cube.CubeManager.getLookupTable(CubeManager.java:621)
      at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:61)
      at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42)
      at org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56)
      at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
      at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
      at org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:63)
      at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:112)
      at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:57)
      at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:112)
      at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:127)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      at java.lang.Thread.run(Thread.java:745)
      Caused by: java.lang.ArrayIndexOutOfBoundsException: 19
      at org.apache.kylin.dict.lookup.LookupStringTable.convertRow(LookupStringTable.java:85)
      at org.apache.kylin.dict.lookup.LookupStringTable.convertRow(LookupStringTable.java:34)
      at org.apache.kylin.dict.lookup.LookupTable.initRow(LookupTable.java:76)
      at org.apache.kylin.dict.lookup.LookupTable.init(LookupTable.java:67)
      at org.apache.kylin.dict.lookup.LookupStringTable.init(LookupStringTable.java:79)
      at org.apache.kylin.dict.lookup.LookupTable.<init>(LookupTable.java:55)
      at org.apache.kylin.dict.lookup.LookupStringTable.<init>(LookupStringTable.java:65)
      at org.apache.kylin.cube.CubeManager.getLookupTable(CubeManager.java:619)
      ... 13 more

      and a simple exception when queried by a lookup table dimension

      ERROR [http-bio-7070-exec-7] controller.QueryController:209 : Exception when execute sql
      at org.apache.calcite.avatica.Helper.createException(Helper.java:56)
      at org.apache.calcite.avatica.Helper.createException(Helper.java:41)
      at org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:143)
      at org.apache.calcite.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:186)
      at org.apache.kylin.rest.service.QueryService.execute(QueryService.java:366)
      at org.apache.kylin.rest.service.QueryService.queryWithSqlMassage(QueryService.java:278)
      at org.apache.kylin.rest.service.QueryService.query(QueryService.java:121)
      at org.apache.kylin.rest.service.QueryService$$FastClassByCGLIB$$4957273f.invoke(<generated>)
      at net.sf.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204)
      at org.springframework.aop.framework.Cglib2AopProxy$DynamicAdvisedInterceptor.intercept(Cglib2AopProxy.java:618)
      at org.apache.kylin.rest.service.QueryService$$EnhancerByCGLIB$$315e2079.query(<generated>)
      at org.apache.kylin.rest.controller.QueryController.doQueryWithCache(QueryController.java:192)
      at org.apache.kylin.rest.controller.QueryController.query(QueryController.java:94)
      Caused by: java.lang.ArrayIndexOutOfBoundsException: 19
      at org.apache.kylin.dict.lookup.LookupStringTable.convertRow(LookupStringTable.java:85)
      at org.apache.kylin.dict.lookup.LookupStringTable.convertRow(LookupStringTable.java:34)
      at org.apache.kylin.dict.lookup.LookupTable.initRow(LookupTable.java:76)
      at org.apache.kylin.dict.lookup.LookupTable.init(LookupTable.java:67)
      at org.apache.kylin.dict.lookup.LookupStringTable.init(LookupStringTable.java:79)
      at org.apache.kylin.dict.lookup.LookupTable.<init>(LookupTable.java:55)
      at org.apache.kylin.dict.lookup.LookupStringTable.<init>(LookupStringTable.java:65)
      at org.apache.kylin.cube.CubeManager.getLookupTable(CubeManager.java:619)

      Though the exception message, we found that one lookup table had been changed in hive (add columns) and not been synchronized with kylin. However, the cause of this problem is too subtle and not easily found.
      As for SnapshotTable, only checking 'row.length <= maxIndex' in takeSnapshot method to detect 'Bad hive table row ' is not enough. And only encode and store the data columns described in tableDesc could be better since these columns not in tableDesc are not used in any cube definition.

        Attachments

        1. KYLIN-1985.patch
          2 kB
          zhengdong

          Activity

            People

            • Assignee:
              zhengd zhengdong
              Reporter:
              zhengd zhengdong
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: