Currently, TajoClient connects with TajoMaster as well as TajoWorker. A query submission is passed from TajoClient to TajoMaster, and than TajoMaster forwards the query to one query master running on a TajoWorker. After than, a client monitors the query progress through the query master.
In other words, TajoClient contacts both TajoMaster and TajoWorker. It has three disadvantages:
- a network firewall should allow connections to TajoWorker nodes.
- all components should have complex states and communication logic one another.
TAJO-1160is trying to remove hadoop dependency from TajoClient. In TAJO-1160, TajoClient will use only some server which forwards query results to itself instead of reading directly HDFS. TajoMaster would be the best component to forward the query results because TajoClient always connects with TajoMaster. So, if TajoClient communicates with only TajoMaster, all client logic and protocol would be simple.