Uploaded image for project: 'Kylin'
  1. Kylin
  2. KYLIN-4291

Parallel segment building may causes WriteConflictException

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: v2.6.4
    • Fix Version/s: v2.6.5, v3.1.0
    • Component/s: Job Engine
    • Labels:
      None

      Description

      We noticed a segment refresh job at ERROR status at "2019-12-10 02:27:15". After a deep dive into kylin.log, we found that the WriteConflictException was raised when trying to update the last modified time for the dict.

       

      (Sensitive content are consered)

      2019-12-10 02:27:15,576 ERROR [Scheduler 1186524190 Job f76fb12e-477f-b404-e021-adaf520d04bb-321] common.HadoopShellExecutable:65 : error execute HadoopShellExecutable{id=f76fb12e-477f-b404-e021-adaf520d04bb-03, name=Build Dimension Dictionary, state=RUNNING}
      org.apache.kylin.common.persistence.WriteConflictException: Overwriting conflict /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict, expect old TS 1575916011421, but it is 1575916035573
      at org.apache.kylin.storage.hbase.HBaseResourceStore.updateTimestampImpl(HBaseResourceStore.java:372)
      at org.apache.kylin.common.persistence.ResourceStore.lambda$updateTimestampWithRetry$4(ResourceStore.java:443)
      at org.apache.kylin.common.persistence.ExponentialBackoffRetry.doWithRetry(ExponentialBackoffRetry.java:52)
      at org.apache.kylin.common.persistence.ResourceStore.updateTimestampWithRetry(ResourceStore.java:442)
      at org.apache.kylin.common.persistence.ResourceStore.updateTimestampCheckPoint(ResourceStore.java:437)
      at org.apache.kylin.common.persistence.ResourceStore.updateTimestamp(ResourceStore.java:432)
      at org.apache.kylin.dict.DictionaryManager.updateExistingDictLastModifiedTime(DictionaryManager.java:197)
      at org.apache.kylin.dict.DictionaryManager.trySaveNewDict(DictionaryManager.java:157)
      at org.apache.kylin.dict.DictionaryManager.saveDictionary(DictionaryManager.java:339)
      at org.apache.kylin.cube.CubeManager$DictionaryAssist.saveDictionary(CubeManager.java:1145)
      at org.apache.kylin.cube.CubeManager.saveDictionary(CubeManager.java:1107)
      at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:100)
      at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:69)
      at org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:73)
      at org.apache.kylin.engine.mr.MRUtil.runMRJob(MRUtil.java:93)
      at org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:63)
      at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:167)
      at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:71)
      at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:167)
      at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:114)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      at java.lang.Thread.run(Thread.java:745)
      

      We have a glance at the codebase (Kylin-2.6.4), we found the last modified time of the dict (which has been built) will be updated at the step "Build Dimension Dictionary". The log blow shows the same dict is processed many times.

      Liangs-MacBook-Pro:~ pwrliang$ cat 12-10.log |grep "has already been built, save it"|grep "THIS_IS_COLUMN_NAME"
      2019-12-10 02:07:10,103 DEBUG [Scheduler 1186524190 Job 5c4b2ee9-c8aa-9a69-ffb0-3a6811ad22a9-372] cli.DictionaryGeneratorCLI:99 : Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
      2019-12-10 02:07:11,036 DEBUG [Scheduler 1186524190 Job 593979b8-e80b-590f-335b-526c2fc49080-295] cli.DictionaryGeneratorCLI:99 : Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
      2019-12-10 02:07:20,455 DEBUG [Scheduler 1186524190 Job 160952f3-fef2-4fdf-c08a-77d49f830805-246] cli.DictionaryGeneratorCLI:99 : Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
      2019-12-10 02:07:20,888 DEBUG [Scheduler 1186524190 Job 69304c1c-cc02-3bbe-67d4-efdd3858e88a-234] cli.DictionaryGeneratorCLI:99 : Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
      2019-12-10 02:07:21,634 DEBUG [Scheduler 1186524190 Job 345b688c-7992-f2cf-ae06-835675f545ea-117] cli.DictionaryGeneratorCLI:99 : Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
      2019-12-10 02:07:23,040 DEBUG [Scheduler 1186524190 Job 0ea711ed-71b5-ae47-a34a-f0a27fd9d7cc-303] cli.DictionaryGeneratorCLI:99 : Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
      2019-12-10 02:07:36,358 DEBUG [Scheduler 1186524190 Job 6daf351f-5073-128f-8754-f499e345db54-308] cli.DictionaryGeneratorCLI:99 : Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
      2019-12-10 02:26:43,716 DEBUG [Scheduler 1186524190 Job c90a765c-976f-49c2-38d8-e6812d091cfb-295] cli.DictionaryGeneratorCLI:99 : Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
      2019-12-10 02:26:44,944 DEBUG [Scheduler 1186524190 Job cd108004-12c6-078f-8646-f29841de9e40-205] cli.DictionaryGeneratorCLI:99 : Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
      2019-12-10 02:26:51,306 DEBUG [Scheduler 1186524190 Job d808dadf-8eea-34db-1b1f-d50e923b4e3a-372] cli.DictionaryGeneratorCLI:99 : Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
      2019-12-10 02:27:15,523 DEBUG [Scheduler 1186524190 Job ea07ef8f-b2b7-b3f8-c481-df615025217d-117] cli.DictionaryGeneratorCLI:99 : Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
      2019-12-10 02:27:15,535 DEBUG [Scheduler 1186524190 Job f76fb12e-477f-b404-e021-adaf520d04bb-321] cli.DictionaryGeneratorCLI:99 : Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
      2019-12-10 02:29:42,504 DEBUG [Scheduler 1186524190 Job f894f427-b170-19b6-76fb-e1b671659771-321] cli.DictionaryGeneratorCLI:99 : Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
      
      

       

      Since we issue the refresh segments request concurrently (with REST APIs), the dict has a chance to be modified concurrently. Kylin uses checkAndPut to update HBase to prevent race-condition, so the WriteConflictException may be raised when updating the same resource. 
       

      Liangs-MacBook-Pro:~ pwrliang$ cat 12-10.log |grep /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict|grep "Update dictionary "
      2019-12-10 02:07:10,223 INFO  [Scheduler 1186524190 Job 5c4b2ee9-c8aa-9a69-ffb0-3a6811ad22a9-372] dict.DictionaryManager:198 : Update dictionary /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict lastModifiedTime to 1575914830203
      2019-12-10 02:07:11,109 INFO  [Scheduler 1186524190 Job 593979b8-e80b-590f-335b-526c2fc49080-295] dict.DictionaryManager:198 : Update dictionary /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict lastModifiedTime to 1575914831090
      2019-12-10 02:07:20,554 INFO  [Scheduler 1186524190 Job 160952f3-fef2-4fdf-c08a-77d49f830805-246] dict.DictionaryManager:198 : Update dictionary /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict lastModifiedTime to 1575914840528
      2019-12-10 02:07:21,762 INFO  [Scheduler 1186524190 Job 345b688c-7992-f2cf-ae06-835675f545ea-117] dict.DictionaryManager:198 : Update dictionary /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict lastModifiedTime to 1575914841727
      2019-12-10 02:07:23,253 INFO  [Scheduler 1186524190 Job 0ea711ed-71b5-ae47-a34a-f0a27fd9d7cc-303] dict.DictionaryManager:198 : Update dictionary /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict lastModifiedTime to 1575914843095
      2019-12-10 02:26:43,843 INFO  [Scheduler 1186524190 Job c90a765c-976f-49c2-38d8-e6812d091cfb-295] dict.DictionaryManager:198 : Update dictionary /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict lastModifiedTime to 1575916003771
      2019-12-10 02:26:51,436 INFO  [Scheduler 1186524190 Job d808dadf-8eea-34db-1b1f-d50e923b4e3a-372] dict.DictionaryManager:198 : Update dictionary /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict lastModifiedTime to 1575916011421
      2019-12-10 02:27:15,593 INFO  [Scheduler 1186524190 Job ea07ef8f-b2b7-b3f8-c481-df615025217d-117] dict.DictionaryManager:198 : Update dictionary /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict lastModifiedTime to 1575916035573
      2019-12-10 02:29:42,564 INFO  [Scheduler 1186524190 Job f894f427-b170-19b6-76fb-e1b671659771-321] dict.DictionaryManager:198 : Update dictionary /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict lastModifiedTime to 1575916182549
      

      The log above shown that the job "f76fb12e-477f-b404-e021-adaf520d04bb" has not been print out the log - "Update dictionary blablabla...." because of the WriteConflictException.
       
      After contacted with community, Shao Feng Shi has been confirmed that seems to be a bug introduced in KYLIN-3977 in Kylin 2.6.3.
       

       

       

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                zhangyaqian Yaqian Zhang
                Reporter:
                pwrliang Liang Geng
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: