Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-4000

Enable downstream vertex connecting to an EPHEMERAL data source, to reason about network connections of upstream tasks.

    XMLWordPrintableJSON

Details

    • Task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      This is an umbrella task for TEZ-3997

      Another property that is usually shared with CONCURRENT on the same edge is EPHEMERAL data source. When two vertices are running concurrently, direct communications between tasks in those vertices become possible, and oftentimes necessary, throughout the lifetime of the running task. This can be articulated by an EPHEMERAL data sources, and this change aims to support such scenarios, which are readily found in real-time applications (such as interactive query) and/or customized applications that would like to control their own data communications (such as parameter-server).

       

      This change will allow Tez to be the central orchestrator that gathers necessary network information from all upstream tasks, compiles them together and send it to downstream tasks. Particularly, the following changes are planned:

      1. For two vertices connected via an edge with both CONCURRENT scheduling type and EPHEMERAL data source type, the task in upstream vertex will open network port, and send an VertexManagerEvent(VME) immediately upon running. The payload of VME includes necessary information to communicate to this task through direct network communication (such as ip and port). The vertex manager of the downstream vertex, typed VertexManagerWithConcurrentInputs, will receive these VMEs, and are responsible for aggregate (including de-dup if necessary) all information together in onVertexManagerEventReceived().
      2. Once all VMEs have been received, a CustomProcessorEvent will be constructed with a payload that includes the aggregated information, and be routed to downstream tasks.

      The change will introduce additional optional entries in VertexManagerEventPayload and a new custom payload that will be embedded into CustomProcessorEvent.

       

      Upon completion of functional feature in this change, additional feature such as handling of failover in CONCURRENT/EPHEMERAL edge will be addressed in future umbrea JIRAs.

      Attachments

        Activity

          People

            yingdachen Yingda Chen
            yingdachen Yingda Chen
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: