Uploaded image for project: 'Apache HAWQ (Retired)'
  1. Apache HAWQ (Retired)
  2. HAWQ-1270

Plugged storage back-ends for HAWQ

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      Since HAWQ only depends on Hadoop and Parquet for columnar format support, I would like to propose pluggable storage backend design for Hawq. Hadoop is already supported but there is Ceph - a distributed, storage system which offers standard Posix compliant file system, object and a block storage. Ceph is also data location aware, written in C+. and is more sophisticated storage backend compare to Hadoop at this time. It provides replicated and erasure encoded storage pools, Other great features of Ceph are: snapshots and an algorithmic approach to map data to the nodes rather than having centrally managed namenodes. I don't think HDFS offers any of these features. In terms of performance, Ceph should be faster than HFDS since it is written on C+ and because it doesn't have scalability limitations when mapping data to storage pools, compare to Hadoop, where name node is such point of contention.

      Attachments

        Issue Links

          Activity

            People

              yjin Yi Jin
              dbuzolin Dmitry Buzolin
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated: