Uploaded image for project: 'Tajo'
  1. Tajo
  2. TAJO-2046

Support Kudu as one of Tajo's storage

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Storage
    • Labels:

      Description

      Kudu (https://github.com/cloudera/kudu) is a newly emerging system for high performance updates and analysis query processing. Supporting Kudu will also give a benefit for Tajo users by simplifying their architecture and decreasing analysis latency.

        Activity

        Hide
        mucahid.erenler Mücahid Erenler added a comment -

        Hi Jihoon,

        I'm Mücahid and currently studying Computer Engineering at Sakarya University, Turkey. I have studied Kudu and Tajo and found this issue interesting for me. I'm also interested in distributed systems and its applications. I believe that I can contribute Tajo with your mentorship.

        I have also studied Tajo's source code and thought that we are going to develop new storage module for Tajo in "tajo-storage" (https://github.com/apache/tajo/tree/master/tajo-storage) sub-module.

        Could you give some more technical information about this issue? Is there anything I can do at this point?

        Thank you!
        Mücahid

        Show
        mucahid.erenler Mücahid Erenler added a comment - Hi Jihoon, I'm Mücahid and currently studying Computer Engineering at Sakarya University, Turkey. I have studied Kudu and Tajo and found this issue interesting for me. I'm also interested in distributed systems and its applications. I believe that I can contribute Tajo with your mentorship. I have also studied Tajo's source code and thought that we are going to develop new storage module for Tajo in "tajo-storage" ( https://github.com/apache/tajo/tree/master/tajo-storage ) sub-module. Could you give some more technical information about this issue? Is there anything I can do at this point? Thank you! Mücahid
        Hide
        jihoonson Jihoon Son added a comment -

        Hi Mücahid Erenler, thanks for your interest!

        Your starting point looks good. I also think we need to add a new sub-module like "tajo-storage-kudu" to the tajo-storage module.

        Tajo has a concept of Tablespace (http://tajo.apache.org/docs/devel/table_management/tablespaces.html) to support various types of underlying storage. You can think that Tablespace is the abstract interface between Tajo and underlying data sources. Each tablespace represents the storage type where data are stored on and provides an interface to access them (Scanner and Appender in Tajo).

        The goal of this ticket is to add a tablespace for Kudu. Here are mandatory issues I think.

        The below issues are optional, but will be very helpful for Tajo.

        • Filter push down optimization: Since Kudu can process simple predicates, Tajo can read data which satisfy those predicates.

        If you have more questions, please feel free to ask me anytime.

        Thanks,
        Jihoon

        Show
        jihoonson Jihoon Son added a comment - Hi Mücahid Erenler , thanks for your interest! Your starting point looks good. I also think we need to add a new sub-module like "tajo-storage-kudu" to the tajo-storage module. Tajo has a concept of Tablespace ( http://tajo.apache.org/docs/devel/table_management/tablespaces.html ) to support various types of underlying storage. You can think that Tablespace is the abstract interface between Tajo and underlying data sources. Each tablespace represents the storage type where data are stored on and provides an interface to access them (Scanner and Appender in Tajo). The goal of this ticket is to add a tablespace for Kudu. Here are mandatory issues I think. Implement KuduTablespace Split generation: Need to consider how we create splits (fragments) to for distributed processing (Tablespace.getSplits() method) Tablespace code: https://github.com/apache/tajo/blob/master/tajo-storage/tajo-storage-common/src/main/java/org/apache/tajo/storage/Tablespace.java Implement KuduFragment The Fragment is similar to the split in MapReduce. It contains the information of which part of data will be processed by each task. Fragment code: https://github.com/apache/tajo/blob/master/tajo-storage/tajo-storage-common/src/main/java/org/apache/tajo/storage/fragment/Fragment.java Implement KuduScanner and KuduAppender Split read: Need to consider how we can read the part of data specified in the given fragment. Type conversion: Data types and internal representation should be converted between Tajo and Kudu. Projection push down: Tajo needs to be able to access only necessary columns. Scanner code: https://github.com/apache/tajo/blob/master/tajo-storage/tajo-storage-common/src/main/java/org/apache/tajo/storage/Scanner.java Appender code: https://github.com/apache/tajo/blob/master/tajo-storage/tajo-storage-common/src/main/java/org/apache/tajo/storage/Appender.java The below issues are optional, but will be very helpful for Tajo. Filter push down optimization: Since Kudu can process simple predicates, Tajo can read data which satisfy those predicates. If you have more questions, please feel free to ask me anytime. Thanks, Jihoon
        Hide
        tlipcon Todd Lipcon added a comment -

        Feel free to ping the Kudu dev mailing list if you need any tips. We're happy to help with your integration!

        Show
        tlipcon Todd Lipcon added a comment - Feel free to ping the Kudu dev mailing list if you need any tips. We're happy to help with your integration!
        Hide
        jihoonson Jihoon Son added a comment -

        Thanks Todd Lipcon!
        We're currently looking forward to handle this issue via GSoC. Some students applied for this issue and we are waiting for the result.
        No matter what the result will be, we will support Kudu as soon as possible!

        Show
        jihoonson Jihoon Son added a comment - Thanks Todd Lipcon ! We're currently looking forward to handle this issue via GSoC. Some students applied for this issue and we are waiting for the result. No matter what the result will be, we will support Kudu as soon as possible!

          People

          • Assignee:
            seian Byunghoon Lim
            Reporter:
            jihoonson Jihoon Son
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:

              Development