Uploaded image for project: 'Kylin'
  1. Kylin
  2. KYLIN-4819

build cube failed when `kylin.metadata.hbase-client-retries-number` great than 1

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • v3.1.1
    • v3.1.2
    • Job Engine
    • None

    Description

      2020-11-11 07:31:49,187 TRACE [Scheduler 2133794029 Job 70c242ce-6756-f77a-4b79-6b75c6ecd884-22265] hbase.HBaseResourceStore:334 : Update row /execute_output/70c242ce-6756-f77a-4b79-6b75c6ecd884-10 from oldTs: 1605051060239, to newTs: 1605051080210, operation result: false
      2020-11-11 07:31:49,196 ERROR [Scheduler 2133794029 Job 70c242ce-6756-f77a-4b79-6b75c6ecd884-22265] common.MapReduceExecutable:212 : error execute MapReduceExecutable\{id=70c242ce-6756-f77a-4b79-6b75c6ecd884-10, name=Build N-Dimension Cuboid : level 5, state=RUNNING}
      org.apache.kylin.common.persistence.WriteConflictException: Overwriting conflict /execute_output/70c242ce-6756-f77a-4b79-6b75c6ecd884-10, expect old TS 1605051060239, but it is 1605051080210
       at org.apache.kylin.storage.hbase.HBaseResourceStore.checkAndPutResourceImpl(HBaseResourceStore.java:337)
       at org.apache.kylin.common.persistence.ResourceStore$6.call(ResourceStore.java:443)
       at org.apache.kylin.common.persistence.ResourceStore$6.call(ResourceStore.java:440)
       at org.apache.kylin.common.persistence.ExponentialBackoffRetry.doWithRetry(ExponentialBackoffRetry.java:52)
       at org.apache.kylin.common.persistence.ResourceStore.checkAndPutResourceWithRetry(ResourceStore.java:440)
       at org.apache.kylin.common.persistence.ResourceStore.checkAndPutResourceCheckpoint(ResourceStore.java:428)
       at org.apache.kylin.common.persistence.ResourceStore.checkAndPutResource(ResourceStore.java:422)
       at org.apache.kylin.common.persistence.ResourceStore.checkAndPutResource(ResourceStore.java:402)
       at org.apache.kylin.common.persistence.ResourceStore.checkAndPutResource(ResourceStore.java:381)
       at org.apache.kylin.job.dao.ExecutableDao.writeJobOutputResource(ExecutableDao.java:252)
       at org.apache.kylin.job.dao.ExecutableDao.updateJobOutput(ExecutableDao.java:426)
       at org.apache.kylin.job.execution.ExecutableManager.addJobInfo(ExecutableManager.java:570)
       at org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:177)
       at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:191)
       at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:71)
       at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:191)
       at org.apache.kylin.job.impl.threadpool.DistributedScheduler$JobRunner.run(DistributedScheduler.java:110)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
       at java.lang.Thread.run(Thread.java:745)
      

      When the HBase cluster has performance problems or regions move, kylin may fail to access HBase. However, many exceptions can be recovered by retrying. Therefore, I suggest setting the default value of the number of retries to 3 KYLIN-4711

      However, after retrying is enabled, the exception writeconflictexception will appear in some scenarios, which is caused by the checkAndPut operation.

      Attachments

        Issue Links

          Activity

            People

              gxcheng Guangxu Cheng
              gxcheng Guangxu Cheng
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: