Uploaded image for project: 'Kylin'
  1. Kylin
  2. KYLIN-4167

Refactor streaming coordinator

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • v3.0.0
    • Real-time Streaming
    • None

    Description

      Summary

      1. Currently, coordinator has too many responsibility, which violate single responsibility principle, and it not easy for extension, a good separation of responsibilities is a recommended way.
      2. Some cluster level operation has no atomicity guarantee, we should implement then in idempotent way to achieve final consistency
      3.  Resubmit when job was discarded
      4. Clarify overall design for realtime OLAP

       

      StreamingCoordinator

      Facade of coordinator, will controll BuildJobSummitter/ReceiverClusterMangaer and delegate operation to them.

      BuildJobSubmitter

      The main responsibility of BuildJobSubmitter including:

      1. Try to find candidate segment which ready to submit a build job

      2. Trace the status of candidate segment's build job and promote segment if it is has met requirements

       

      ReceiverClusterManager

      This class manage operation related to multi streaming receivers. They are often not atomic and maybe idempotent.

      ClusterStateChecker

      Basic step of this class:

      1. stop/pause coordinator to avoid underlying concurrency issue

      2. check inconsistent state of all receiver cluster

      3. send summary via mail to kylin admin

      4. if need, call ClusterDoctor to repair inconsistent issue

      ClusterDoctor

      Repair inconsistent state according to result of ClusterStateChecker

       


      Candidate Segment

      The candidate segments are those segments what can be saw/perceived by streaming coordinator,

      candidate segment could be divided into following state/queue:

      1. segment which data are uploaded PARTLY

      2. segment which data are uploaded completely and WAITING to build

      3. segment which in BUILDING state, job's state should be one of (NEW/RUNNING/ERROR/DISCARD)

      4. segment which built succeed and wait to be delivered to historical part (and to be deleted in realtime part)

      5. segment which in historical part(HBase Ready Segment)

       

      By design, segment should transfer to next queue in sequential way(shouldn't jump the queue), do not break this.

      Atomicity

      In a multi-step transcation, following acepts should be thought twice:

      1. should fail fast or continue when exception thrown.

      2. should API(remote call) be synchronous or asynchronous

      3. when transcation failed, could roll back always succeed

      4. transcation should be idempotent so when it failed, it could be fixed by retry

       

      How to ensure whole cluster opreation smoothly without blocking problem. I divided all multi-step transcation into three kinds:

      NotAtomicIdempotent

      NotAtomicAndNotIdempotent

      NonSideEffect

      Attachments

        Activity

          People

            hit_lacus Xiaoxiang Yu
            hit_lacus Xiaoxiang Yu
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: