Uploaded image for project: 'Kylin'
  1. Kylin
  2. KYLIN-5163

Global dictionary build job may produce incomplete dictionary file

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • v4.0.1
    • None
    • Job Engine
    • None

    Description

      The current dictionary spark build job uses function `NBucketDictionary.saveBucketDict` to write dictionary files (include CURR file and PREV file) for each partition. But it does not consider that there may be concurrency multiple tasks for one same partition, such as scenarios like task retry or speculation task. Concurrency multiple tasks of one partition may cause incomplete dictionary file and we've encountered this issue in production.

      I describe the issue in terms of timeline: 
      1. currently in the dictionary building phase, one executor called E1 was preparing to build dictionary file for partition 0 
      2. driver sent E1  shutdown message because of YARN resource preemption. Then driver mark the task of partition 0 failed and created a retry task to another executor called E2.
      3. E2 began to proccess task, and finished task in a short time.
      4. after E2 finished task, E1 began to proccess task, so E1 delete complete dictionary file which was created by E2 and created new dictionary file to write.
      5. Then E1 received driver's shutdown message and kill himself, finally left a incomplete dictionary file which was not finished.

      6. after other partition finished, the stage was marked successfull.
      7. when next phase table encoding using incomplete dictionary file, stage will failed.

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            sleep1661 hujiahua
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: