Uploaded image for project: 'Kylin'
  1. Kylin
  2. KYLIN-4607

destributed scheduler kill job instead of fetcherRunner kill job

    XMLWordPrintableJSON

Details

    • Bug
    • Status: In Progress
    • Major
    • Resolution: Unresolved
    • v3.0.0-alpha
    • None
    • Job Engine
    • None

    Description

      see KYLIN-4250
      For DistributedScheduler, even if FetchFailed is true, not in runningJobs, the status is running, FetchRunner should not kill the job because the job may be scheduler by another kylin service. So destributed scheduler kill job when isMetaDataPersistException

      注释掉了一个测试用例,这个用例依赖的逻辑是task状态异常,但metastore有问题作业异常状态无法持久化,于是通过fetcher把运行状态改成异常状态。这个逻辑是有问题的。
      对DefaultScheduler来说,如果JobRunner抓到持久化异常,会执行强杀作业逻辑,强杀会一直循环直到成功,是不需要fetcher再去强杀的,因为如果持久化异常导致JobRunner强杀未成功,fetcher也不会成功。
      而对于DistributedScheduler来说,fetcher的强杀逻辑会把其他进程正在running的作业给误杀
      掉,更不能保留。
      详见https://zhuanlan.zhihu.com/p/154376900
      所以这个case可以删掉

      Attachments

        Activity

          People

            xiaoge chuxiao
            xiaoge chuxiao
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: