Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: nutchgora
    • Component/s: None
    • Labels:
      None

      Description

      By Nutch-808, it is clear that we need an ORM layer on top of the datastore, so that different backends can be used to store data.

      This issue will track the development of the ORM layer. Initially full support for HBase is planned, with RDBM, Hadoop MapFile and Cassandra support scheduled for later.

        Activity

        Hide
        Doğacan Güney added a comment -

        Nutch now uses GORA as an ORM layer.

        Closing this issue as fixed.

        Show
        Doğacan Güney added a comment - Nutch now uses GORA as an ORM layer. Closing this issue as fixed.
        Hide
        Chris A. Mattmann added a comment -

        +1, close it out...

        Show
        Chris A. Mattmann added a comment - +1, close it out...
        Hide
        Doğacan Güney added a comment -

        Any objections to closing this issue?

        Show
        Doğacan Güney added a comment - Any objections to closing this issue?
        Hide
        Enis Soztutar added a comment -

        Hi Piet,
        The code for Gora will reside in GitHub for now, since Nutch and Gora are pretty orthogonal. But as stated before, Nutch is the first user of Gora, and Gora does not yet have a separate community so I intend to always keep nutch community updated (via this issue and nutch-dev mailing list), and hope for feedback from the Nutch community.

        Moreover, NutchBase has already been ported to using Gora, so at some point, Gora should be reviewed and accepted as a dependency for Nutch.

        Show
        Enis Soztutar added a comment - Hi Piet, The code for Gora will reside in GitHub for now, since Nutch and Gora are pretty orthogonal. But as stated before, Nutch is the first user of Gora, and Gora does not yet have a separate community so I intend to always keep nutch community updated (via this issue and nutch-dev mailing list), and hope for feedback from the Nutch community. Moreover, NutchBase has already been ported to using Gora, so at some point, Gora should be reviewed and accepted as a dependency for Nutch.
        Hide
        Piet Schrijver added a comment -

        Will development for gora be tracked under this or any nutch ticket?

        Show
        Piet Schrijver added a comment - Will development for gora be tracked under this or any nutch ticket?
        Hide
        Enis Soztutar added a comment -

        I have further developed the code, which was once part of NutchBase for handling object to hbase mapping into a new project as per the above discussion.
        The project is named Gora, and it is hosted at GitHub.

        The project is hosted at
        http://github.com/enis/gora

        A short design document is at http://wiki.github.com/enis/gora/design, and a quick start guide is at http://wiki.github.com/enis/gora/quick-start.

        You can check out the code using
        $ git clone git://github.com/enis/gora.git

        What it means for Nutch?
        Gora started as a part of Dogacan's NutchBase implementation, but the goals for the project are clearly different. However, Gora is primarily developed to handle Nutch's use cases. Specifically, Gora will handle the HBase integration layer for nutchbase, and later a Hadoop Mapfile or TFile based persistency will be developed.

        In the short term, we plan to use Gora's artifacts as a library in Nutch. Either me or Dogacan will switch the current NutchBase branch to using Gora shortly.

        Gora is still in very early stages and needs your support. We would be more than happy if the Nutch community could share comments, feedbacks, use cases and feature requests, or even patches. I suppose we can use this issue or the mailing list for this task.

        Show
        Enis Soztutar added a comment - I have further developed the code, which was once part of NutchBase for handling object to hbase mapping into a new project as per the above discussion. The project is named Gora, and it is hosted at GitHub. The project is hosted at http://github.com/enis/gora A short design document is at http://wiki.github.com/enis/gora/design , and a quick start guide is at http://wiki.github.com/enis/gora/quick-start . You can check out the code using $ git clone git://github.com/enis/gora.git What it means for Nutch? Gora started as a part of Dogacan's NutchBase implementation, but the goals for the project are clearly different. However, Gora is primarily developed to handle Nutch's use cases. Specifically, Gora will handle the HBase integration layer for nutchbase, and later a Hadoop Mapfile or TFile based persistency will be developed. In the short term, we plan to use Gora's artifacts as a library in Nutch. Either me or Dogacan will switch the current NutchBase branch to using Gora shortly. Gora is still in very early stages and needs your support. We would be more than happy if the Nutch community could share comments, feedbacks, use cases and feature requests, or even patches. I suppose we can use this issue or the mailing list for this task.
        Hide
        Enis Soztutar added a comment -

        Actually, we plan to develop the code for this layer in another project because,

        • ORM layer is orthogonal to Nutch code, so it does not belong there
        • Extracting the code will be much harder later
        • If developed well, this code will be useful to other projects (interesting is that there is no API to support both HBase and Cassandra)
        • Code will be much more clean
        • Nutch can use the artifacts from this project

        Nevertheless, we plan to piggyback on the Nutch community to support the initial development, review and exposure. I will update this issue as the code develops, and will kindly ask for reviews. In the long term, we can move the project to Apache Sandbox, or as a Hadoop/Nutch sub project (once Nutch becomes TLP).

        A design document and initial code will be available shortly.

        Show
        Enis Soztutar added a comment - Actually, we plan to develop the code for this layer in another project because, ORM layer is orthogonal to Nutch code, so it does not belong there Extracting the code will be much harder later If developed well, this code will be useful to other projects (interesting is that there is no API to support both HBase and Cassandra) Code will be much more clean Nutch can use the artifacts from this project Nevertheless, we plan to piggyback on the Nutch community to support the initial development, review and exposure. I will update this issue as the code develops, and will kindly ask for reviews. In the long term, we can move the project to Apache Sandbox, or as a Hadoop/Nutch sub project (once Nutch becomes TLP). A design document and initial code will be available shortly.

          People

          • Assignee:
            Enis Soztutar
            Reporter:
            Enis Soztutar
          • Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development