Details
-
New Feature
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
In HDFS-7343 we want to develop a comprehensive storage management solution originated from community discussions, in order for allowing convenient, intelligent and effective utilization of various HDFS facilities such as erasure coding, HDFS cache, HSM offering, and etc. based on valuable insights from events and data collected from namenodes, datanodes, frameworks and applications via a pub-sub messaging system. In HDFS-8940 it was discussed that the proposed large scale inotify feature would be better to be implemented via Kafka system to allowing thousands of consumers or inotify clients.
Apache Kafka is a distributed messaging system that aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds, and currently it’s widely used in real-time streaming process field. Considering the above two important use cases desired in Hadoop, we’d like to propose to introduce Kafka as a fundamental event pub-sub service into Hadoop platform. Like FileSystem offering, we’d like to provide MessagingSystem in Hadoop style and conforming Hadoop security, backed by an internal or external existing Kafka cluster. Generally the new service is very convenient to use, and can be used to distribute and exchange various types of events across IO, storage, and computation that produced by Hadoop itself, frameworks or applications on top of it. Then on this basis valuable events can be analyzed in a centralized way so that meaningful applications and usages can be developed.
The design document is under-going and will be submitted in a week. Feedback are very welcome. Thanks!
Be aware that 3.0.0-alpha1 introduced the hadoop-kafka module, underneath hadoop-tools. That should be taken into consideration when/if this gets introduced into the hadoop source tree. For example, it might be worthwhile to rename that one to hadoop-kafkametrics and this one to hadoop-kafkafs or make this feature bundled into that module or ....