Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
Since HAWQ only depends on Hadoop and Parquet for columnar format support, I would like to propose pluggable storage backend design for Hawq. Hadoop is already supported but there is Ceph - a distributed, storage system which offers standard Posix compliant file system, object and a block storage. Ceph is also data location aware, written in C+. and is more sophisticated storage backend compare to Hadoop at this time. It provides replicated and erasure encoded storage pools, Other great features of Ceph are: snapshots and an algorithmic approach to map data to the nodes rather than having centrally managed namenodes. I don't think HDFS offers any of these features. In terms of performance, Ceph should be faster than HFDS since it is written on C+ and because it doesn't have scalability limitations when mapping data to storage pools, compare to Hadoop, where name node is such point of contention.