Kafka is a distributed, partitioned, replicated commit log service. It provides the functionality of a messaging system, but with a unique design. A single Kafka broker can handle hundreds of megabytes of reads and writes per second from thousands of clients.
Apache Kafka is being used for a number of uses cases. One of them is to use Kafka as a feeding system for streaming BigData processes, both in Apache Spark or Hadoop environment. A Kafka output connector could be used for streaming or dispatching crawled documents or metadata and put them in a BigData processing pipeline