Details

    • Type: New Feature
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Storage
    • Labels:

      Description

      MongoDB is an open source database that uses a document-oriented data model and it is widely popular. Supporting MongoDB will be helpful for Tajo users and MongoDB users.

        Activity

        Hide
        RCheungIT Haoran Zhang added a comment -

        Dear Jaehwa,

        I'm Haoran Zhang, a Msc(Computing) student at Imperial College.
        I had worked for about three years at Alibaba Group in the area of distributed computing.
        Here is my cv(www.doc.ic.ac.uk/~hz114/haoran_cv.pdf). Hopefully, this can make you know me better.

        I'm very keen to participate Google Summer of Code 2016 and this issue is very interesting to me.

        About this issue, I have a question. This issue plans to add MongoDB as a data source of Tajo. Does it mean that this proposal is to add a proxy between MongoDB and Tajo converting the data format to what Tajo needs? In other words, it will not involve any changes in query execution. Is that correct?

        In addition, would you mind giving any suggestions or materials for me to start getting familiar with this issue to conduct a formal proposal?

        Thank you very much.

        Haoran Zhang

        Show
        RCheungIT Haoran Zhang added a comment - Dear Jaehwa, I'm Haoran Zhang, a Msc(Computing) student at Imperial College. I had worked for about three years at Alibaba Group in the area of distributed computing. Here is my cv(www.doc.ic.ac.uk/~hz114/haoran_cv.pdf). Hopefully, this can make you know me better. I'm very keen to participate Google Summer of Code 2016 and this issue is very interesting to me. About this issue, I have a question. This issue plans to add MongoDB as a data source of Tajo. Does it mean that this proposal is to add a proxy between MongoDB and Tajo converting the data format to what Tajo needs? In other words, it will not involve any changes in query execution. Is that correct? In addition, would you mind giving any suggestions or materials for me to start getting familiar with this issue to conduct a formal proposal? Thank you very much. Haoran Zhang
        Hide
        blrunner Jaehwa Jung added a comment -

        Dear Haoran Zhang,

        Thank you for your interest.

        You don't have to implement a proxy server between Tajo and MongoDB.
        Tajo provides two interfaces to support various storages.

        One is 'Scanner' which processes one tuple per a call. It resets cursor at initialization phase.
        Per each call, it reads a tuple by given schema from a specific storage and move a cursor to point next tuple.

        Other one is 'Appender' which writes a given tuple to underlying storage, flushes buffered tuples and calculates stats such as accumulating number of written tuples.
        It also operates tuples one by one like Scanner, that is, the main difference is reading and writing.

        Currently, already various storages have been implemented using above interfaces. You can see it at tajo-storage module as following:
        https://github.com/apache/tajo/tree/master/tajo-storage

        Please feel free to ask anything to us.

        Best Regards,
        Jaehwa

        Show
        blrunner Jaehwa Jung added a comment - Dear Haoran Zhang , Thank you for your interest. You don't have to implement a proxy server between Tajo and MongoDB. Tajo provides two interfaces to support various storages. One is 'Scanner' which processes one tuple per a call. It resets cursor at initialization phase. Per each call, it reads a tuple by given schema from a specific storage and move a cursor to point next tuple. Other one is 'Appender' which writes a given tuple to underlying storage, flushes buffered tuples and calculates stats such as accumulating number of written tuples. It also operates tuples one by one like Scanner, that is, the main difference is reading and writing. Currently, already various storages have been implemented using above interfaces. You can see it at tajo-storage module as following: https://github.com/apache/tajo/tree/master/tajo-storage Please feel free to ask anything to us. Best Regards, Jaehwa
        Hide
        bjchathuranga@gmail.com Janaka Chathuranga Thilakarathna added a comment -

        Hi Jaehwa Jung,
        I am a Computer Science and Engineering undergraduate at University of Moratuwa, Sri Lanka. I am really interested in Distributed Computing and I have a quite good practice in programming.

        I went through available storage modules in tajo repository. https://github.com/apache/tajo/tree/master/tajo-storage As I understand the main objective of the project is to implement 'tablespace', 'scanner' and 'appender' for tajo-storage-mongodb. Am I correct?

        Further to connect mongodb, are we going to use mongo-java-driver or something else?
        Thank you very much.

        Best Regards,
        Janaka.

        Show
        bjchathuranga@gmail.com Janaka Chathuranga Thilakarathna added a comment - Hi Jaehwa Jung, I am a Computer Science and Engineering undergraduate at University of Moratuwa, Sri Lanka. I am really interested in Distributed Computing and I have a quite good practice in programming. I went through available storage modules in tajo repository. https://github.com/apache/tajo/tree/master/tajo-storage As I understand the main objective of the project is to implement 'tablespace', 'scanner' and 'appender' for tajo-storage-mongodb. Am I correct? Further to connect mongodb, are we going to use mongo-java-driver or something else? Thank you very much. Best Regards, Janaka.
        Hide
        blrunner Jaehwa Jung added a comment -

        Hi Janaka Chathuranga Thilakarathna,

        Thanks for your interest!

        This is an umbrella issue to provide Mongodb storage. You need to implement main objectives which include Tablespace, Fagment, Appender, Scanner. I think following pages will be helpful for you to prepare this issue.

        And as you mentioned, I also think that mondodb java client is a best choice.

        If you have more questions, please feel free to ask me anytime.

        Regards,
        Jaehwa

        Show
        blrunner Jaehwa Jung added a comment - Hi Janaka Chathuranga Thilakarathna , Thanks for your interest! This is an umbrella issue to provide Mongodb storage. You need to implement main objectives which include Tablespace, Fagment, Appender, Scanner. I think following pages will be helpful for you to prepare this issue. This is Jihoon's comments for supporting Kudu storage. https://issues.apache.org/jira/browse/TAJO-2046 Currently, Tajo already provide HBase as one of Tajo's storage. https://github.com/apache/tajo/tree/master/tajo-storage/tajo-storage-hbase And as you mentioned, I also think that mondodb java client is a best choice. If you have more questions, please feel free to ask me anytime. Regards, Jaehwa
        Hide
        bjchathuranga@gmail.com Janaka Chathuranga Thilakarathna added a comment -

        Hi Jaehwa,

        Thank you for the reply. I'm working on it. By the way is there any mailing list which talk about this issue, which I should subscribed to?
        Thanks you!

        Regards,
        Janaka.

        Show
        bjchathuranga@gmail.com Janaka Chathuranga Thilakarathna added a comment - Hi Jaehwa, Thank you for the reply. I'm working on it. By the way is there any mailing list which talk about this issue, which I should subscribed to? Thanks you! Regards, Janaka.
        Hide
        eminency Jongyoung Park added a comment - - edited

        Hi, refer this link: http://tajo.apache.org/mailing-lists.html

        I think you can talk via developers mailing list.

        Show
        eminency Jongyoung Park added a comment - - edited Hi, refer this link: http://tajo.apache.org/mailing-lists.html I think you can talk via developers mailing list.
        Hide
        bjchathuranga@gmail.com Janaka Chathuranga Thilakarathna added a comment -

        Thank you!

        Show
        bjchathuranga@gmail.com Janaka Chathuranga Thilakarathna added a comment - Thank you!
        Hide
        bjchathuranga@gmail.com Janaka Chathuranga Thilakarathna added a comment -

        Hi Jongyoung,
        Sorry to bother you again. I subscribed to the mail list dev@tajo.apache.org but still I didn't get any email from any thread.
        1. Have I subscribed to the wrong mailing list or is it not active?
        2. Is it OK, if I start a new conversation regarding this issue?

        Regards,
        Janaka

        Than you!

        Show
        bjchathuranga@gmail.com Janaka Chathuranga Thilakarathna added a comment - Hi Jongyoung, Sorry to bother you again. I subscribed to the mail list dev@tajo.apache.org but still I didn't get any email from any thread. 1. Have I subscribed to the wrong mailing list or is it not active? 2. Is it OK, if I start a new conversation regarding this issue? Regards, Janaka Than you!

          People

          • Assignee:
            bjchathuranga@gmail.com Janaka Chathuranga Thilakarathna
            Reporter:
            blrunner Jaehwa Jung
          • Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:

              Development