Uploaded image for project: 'VXQuery'
  1. VXQuery
  2. VXQUERY-188

Fully integrate Lucene Indexing into VXQuery

    Details

      Description

      Currently, the indexing project for VXQuery is able to perform two tasks:

      1) Create a Lucene index from an XML collection (folder).
      2) Specify in a query to use this index to perform the query.

      There are several more desired capabilities to fully integrate Lucene indexing:

      1) Allow updates to collection indexes (when Adding/Deleting/Modifying XML files).
      2) Extend indexing to HDFS folders.
      3) Benchmark the difference between index and non-index plans.
      4) Enable queries to dynamically decide at runtime when to use indexes.

        Issue Links

          Activity

          Hide
          menaka Menaka Madushanka added a comment -

          Hello,

          I'm Menaka Madushanka, a final year Computer Engineering undergraduate at University of Peradeniya Sri Lanka. I found this project is very interesting to me and decided to work on this. I have 3+ years of experience in Java and I have used Lucene for one of my projects.

          I'll be very grateful if you could give me some more information and guidlines about this project.
          Thank you very much
          Menaka Madushanka

          Show
          menaka Menaka Madushanka added a comment - Hello, I'm Menaka Madushanka, a final year Computer Engineering undergraduate at University of Peradeniya Sri Lanka. I found this project is very interesting to me and decided to work on this. I have 3+ years of experience in Java and I have used Lucene for one of my projects. I'll be very grateful if you could give me some more information and guidlines about this project. Thank you very much Menaka Madushanka
          Hide
          sjaco002 Steven Jacobs added a comment -

          Hi,
          Welcome to the project! We created this issue as a potential Google Summer of Code (GSOC) project for this year. Are you interested in working on it as part of GSOC, or are you just interested in the project in general? We would be happy to bring you aboard either way. GSOC is a great opportunity to get paid over the summer while working on a project.

          Show
          sjaco002 Steven Jacobs added a comment - Hi, Welcome to the project! We created this issue as a potential Google Summer of Code (GSOC) project for this year. Are you interested in working on it as part of GSOC, or are you just interested in the project in general? We would be happy to bring you aboard either way. GSOC is a great opportunity to get paid over the summer while working on a project.
          Hide
          menaka Menaka Madushanka added a comment -

          Hi James,
          I'd like to do this as a GSoC project. And I love contributing open source projects. Last year I did a project of Apache Taverna for GSoC and I think I could successfully complete this project.
          So I'd like to know some more information about this. I forked the repo and trying to get familiar with it.

          Thanks and Regards
          Menaka Madushanka

          Show
          menaka Menaka Madushanka added a comment - Hi James, I'd like to do this as a GSoC project. And I love contributing open source projects. Last year I did a project of Apache Taverna for GSoC and I think I could successfully complete this project. So I'd like to know some more information about this. I forked the repo and trying to get familiar with it. Thanks and Regards Menaka Madushanka
          Hide
          sjaco002 Steven Jacobs added a comment -

          Great!
          We are excited to see your proposal. As a start for the project, we recommend the following:

          Go through the developer starter page [1]
          Check out our Wiki page for more developer information [2]

          [1] http://vxquery.apache.org/developer_get_started.html
          [2] https://cwiki.apache.org/confluence/display/VXQUERY/Index

          Show
          sjaco002 Steven Jacobs added a comment - Great! We are excited to see your proposal. As a start for the project, we recommend the following: Go through the developer starter page [1] Check out our Wiki page for more developer information [2] [1] http://vxquery.apache.org/developer_get_started.html [2] https://cwiki.apache.org/confluence/display/VXQUERY/Index
          Hide
          menaka Menaka Madushanka added a comment -

          Thank you very much Steven!


          Menaka Madushanka Jayawardena
          Faculty of Engineering, <http://www.pdn.ac.lk/eng>
          University of Peradeniyaya.
          LinkedIn <http://lk.linkedin.com/in/menakajayawardena>
          TP:- 071 885 1183/ 071 350 5470

          Show
          menaka Menaka Madushanka added a comment - Thank you very much Steven! – Menaka Madushanka Jayawardena Faculty of Engineering, < http://www.pdn.ac.lk/eng > University of Peradeniyaya. LinkedIn < http://lk.linkedin.com/in/menakajayawardena > TP:- 071 885 1183/ 071 350 5470
          Hide
          menaka Menaka Madushanka added a comment -

          Hello,

          I need some clarification about the project tasks.
          As I understood the indexing should be done in following way.

          query from Cli ---> Indexing the sources ----> Executing the query in system.

          With this model, dynamically decide at runtime whether to do indexing is not very clear to me. Is it like a option in query which enables indexing?
          Also may I get some explanation about the 3rd task?

          Thanks and Regards
          Menaka

          Show
          menaka Menaka Madushanka added a comment - Hello, I need some clarification about the project tasks. As I understood the indexing should be done in following way. query from Cli ---> Indexing the sources ----> Executing the query in system. With this model, dynamically decide at runtime whether to do indexing is not very clear to me. Is it like a option in query which enables indexing? Also may I get some explanation about the 3rd task? Thanks and Regards Menaka
          Hide
          sjaco002 Steven Jacobs added a comment -

          Our current model involves pre-indexing sources, not indexing at runtime. We have "build index" statement that creates an index for a given collection. Since our Lucene index is path-based, the index is actually capable of returning the results for the query.

          Right now the query can be written in two forms: query the collection, or query the index. The improvement would be to dynamically decide whether the index would make the execution faster or not, i.e.

          query from Cli ---> Decide whether to use the index ----> Executing the query in system accordingly.

          The third task comes from the fact that the Lucene index is build manually offline. This means that changes to the XML files will not automatically reflect in the index. This task would be to build some sort of an update mechanism (faster than a simple rebuild of the index), whether adding a manual rebuild query or possibly finding a way to automatically update before use (keeping in mind that VXQuery is not an "online" system, meaning that it can't have an active monitoring task).

          Show
          sjaco002 Steven Jacobs added a comment - Our current model involves pre-indexing sources, not indexing at runtime. We have "build index" statement that creates an index for a given collection. Since our Lucene index is path-based, the index is actually capable of returning the results for the query. Right now the query can be written in two forms: query the collection, or query the index. The improvement would be to dynamically decide whether the index would make the execution faster or not, i.e. query from Cli ---> Decide whether to use the index ----> Executing the query in system accordingly. The third task comes from the fact that the Lucene index is build manually offline. This means that changes to the XML files will not automatically reflect in the index. This task would be to build some sort of an update mechanism (faster than a simple rebuild of the index), whether adding a manual rebuild query or possibly finding a way to automatically update before use (keeping in mind that VXQuery is not an "online" system, meaning that it can't have an active monitoring task).
          Hide
          menaka Menaka Madushanka added a comment -

          Thank you very much Steven.


          Menaka Madushanka Jayawardena
          Faculty of Engineering, <http://www.pdn.ac.lk/eng>
          University of Peradeniyaya.
          LinkedIn <http://lk.linkedin.com/in/menakajayawardena>
          TP:- 071 885 1183/ 071 350 5470

          Show
          menaka Menaka Madushanka added a comment - Thank you very much Steven. – Menaka Madushanka Jayawardena Faculty of Engineering, < http://www.pdn.ac.lk/eng > University of Peradeniyaya. LinkedIn < http://lk.linkedin.com/in/menakajayawardena > TP:- 071 885 1183/ 071 350 5470
          Hide
          menaka Menaka Madushanka added a comment -

          Hi Steven,

          I shared my draft proposal via Google Summer of Code site.
          I'll be very grateful if you could show me if there is anything to be modified or changed.

          Thank you very much
          Menaka

          Show
          menaka Menaka Madushanka added a comment - Hi Steven, I shared my draft proposal via Google Summer of Code site. I'll be very grateful if you could show me if there is anything to be modified or changed. Thank you very much Menaka
          Hide
          prestonc Preston Carman added a comment -

          Modified the objectives to create a better workflow and removed JSONiq from this issue. Indexing JSON can come after we have a working JSONiq implementation.

          Show
          prestonc Preston Carman added a comment - Modified the objectives to create a better workflow and removed JSONiq from this issue. Indexing JSON can come after we have a working JSONiq implementation.
          Hide
          menaka Menaka Madushanka added a comment -

          Thank you much Preston.

          Show
          menaka Menaka Madushanka added a comment - Thank you much Preston.

            People

            • Assignee:
              Unassigned
              Reporter:
              sjaco002 Steven Jacobs
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:

                Development