Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
    • Lucene Fields:
      New

      Description

      There are parts of Lucene that can potentially be speeded up if computations were to be offloaded from CPU to the GPU(s). With commodity GPUs having as high as 12GB of high bandwidth RAM, we might be able to leverage GPUs to speed parts of Lucene (indexing, search).

      First that comes to mind is spatial filtering, which is traditionally known to be a good candidate for GPU based speedup (esp. when complex polygons are involved). In the past, Mike McCandless has mentioned that "both initial indexing and merging are CPU/IO intensive, but they are very amenable to soaking up the hardware's concurrency."

      I'm opening this issue as an exploratory task, suitable for a GSoC project. I volunteer to mentor any GSoC student willing to work on this this summer.

        Activity

        Hide
        qwerty123 vikash added a comment -

        Hi I am willing to work on this.

        Show
        qwerty123 vikash added a comment - Hi I am willing to work on this.
        Hide
        ichattopadhyaya Ishan Chattopadhyaya added a comment -

        Here are some ideas on things to start out with:

        1. Copy over and index lots of points and corresponding docids to the GPU as an offline, one time operation. Then, given a query point, return top-n nearest indexed points.
        2. Copy over and index lots of points and corresponding docids to the GPU as an offline, one time operation. Then, given a polygon (complex shape), return all points that lie inside the polygon.

        In both the cases, compare performance against existing Lucene spatial search. One would need to choose the most suitable algorithm for doing these as efficiently as possible. Any GPGPU API can be used for now (OpenCL, CUDA) for initial exploration.

        David Smiley, Karl Wright, Nicholas Knize, Michael McCandless, given your depth and expertise in this area, do you have any suggestions? Any other area of Lucene that comes to mind which should be easiest to start with, in terms of exploring GPU based parallelization?

        Show
        ichattopadhyaya Ishan Chattopadhyaya added a comment - Here are some ideas on things to start out with: Copy over and index lots of points and corresponding docids to the GPU as an offline, one time operation. Then, given a query point, return top-n nearest indexed points. Copy over and index lots of points and corresponding docids to the GPU as an offline, one time operation. Then, given a polygon (complex shape), return all points that lie inside the polygon. In both the cases, compare performance against existing Lucene spatial search. One would need to choose the most suitable algorithm for doing these as efficiently as possible. Any GPGPU API can be used for now (OpenCL, CUDA) for initial exploration. David Smiley , Karl Wright , Nicholas Knize , Michael McCandless , given your depth and expertise in this area, do you have any suggestions? Any other area of Lucene that comes to mind which should be easiest to start with, in terms of exploring GPU based parallelization?
        Hide
        ichattopadhyaya Ishan Chattopadhyaya added a comment -

        Another experiment that, I think, is worth trying out:

        • Benchmarking an aggregation over a DocValues field (e.g. using sqrt(), haversine distance etc.), and comparing the corresponding performance when executed on the GPU. This could potentially speed up scoring of results.

        For reference, Postgresql seems to have experienced speedup in some areas (esp. aggregations over column oriented fields): https://www.slideshare.net/kaigai/gpgpu-accelerates-postgresql

        Show
        ichattopadhyaya Ishan Chattopadhyaya added a comment - Another experiment that, I think, is worth trying out: Benchmarking an aggregation over a DocValues field (e.g. using sqrt(), haversine distance etc.), and comparing the corresponding performance when executed on the GPU. This could potentially speed up scoring of results. For reference, Postgresql seems to have experienced speedup in some areas (esp. aggregations over column oriented fields): https://www.slideshare.net/kaigai/gpgpu-accelerates-postgresql
        Hide
        dsmiley David Smiley added a comment -

        I have a question to us all. (a) could whatever comes of this actually be contributed to Lucene itself given the likelihood of requiring native O.S. bindings (lets presume in spatial-extras as it seems this is the only module that can have an external dependency), and (b) does that matter for GSOC or to the expectations of the contributor? If (a) is a "no", we need to be honest up front with the contributor. I know in the past Solr has been denied off-heap filters that would have required a un-pure Java approach. A native binding would be another degree of un-purity

        Show
        dsmiley David Smiley added a comment - I have a question to us all. (a) could whatever comes of this actually be contributed to Lucene itself given the likelihood of requiring native O.S. bindings (lets presume in spatial-extras as it seems this is the only module that can have an external dependency), and (b) does that matter for GSOC or to the expectations of the contributor? If (a) is a "no", we need to be honest up front with the contributor. I know in the past Solr has been denied off-heap filters that would have required a un-pure Java approach. A native binding would be another degree of un-purity
        Hide
        mikemccand Michael McCandless added a comment -

        Maybe even the basic hit scoring that e.g. BooleanScorer does with disjunction of high frequency terms, would be amenable to GPU acceleration? Today BooleanScorer processes a whole window of hits at once, doing fairly simple math (the Similarity methods) on each.

        Show
        mikemccand Michael McCandless added a comment - Maybe even the basic hit scoring that e.g. BooleanScorer does with disjunction of high frequency terms, would be amenable to GPU acceleration? Today BooleanScorer processes a whole window of hits at once, doing fairly simple math (the Similarity methods) on each.
        Hide
        ichattopadhyaya Ishan Chattopadhyaya added a comment -

        David Smiley, that is a very important question. Afaik, there is no Apache compatible GPGPU framework. Both OpenCL and CUDA are likely incompatible with Apache (I am not fully sure). I see that jCUDA is MIT license, which is a wrapper around the native libraries.

        If there are benefits to using GPGPU processing, my thought is that we can ensure all necessary plumbing in our codebase in order to offload processing to some plugin, whereby the user can plugin the exact GPU kernels from outside the Lucene distribution (if those kernels also violate any licensing restrictions we have). If there are clear benefits in speeding things up using a GPU, it would not be, for the end-user, the end of the world if the code comes outside Apache distribution.

        If (a) is a "no", we need to be honest up front with the contributor.

        That is a good point, and we can document this clearly.

        Show
        ichattopadhyaya Ishan Chattopadhyaya added a comment - David Smiley , that is a very important question. Afaik, there is no Apache compatible GPGPU framework. Both OpenCL and CUDA are likely incompatible with Apache (I am not fully sure). I see that jCUDA is MIT license, which is a wrapper around the native libraries. If there are benefits to using GPGPU processing, my thought is that we can ensure all necessary plumbing in our codebase in order to offload processing to some plugin, whereby the user can plugin the exact GPU kernels from outside the Lucene distribution (if those kernels also violate any licensing restrictions we have). If there are clear benefits in speeding things up using a GPU, it would not be, for the end-user , the end of the world if the code comes outside Apache distribution. If (a) is a "no", we need to be honest up front with the contributor. That is a good point, and we can document this clearly.
        Hide
        thetaphi Uwe Schindler added a comment - - edited

        Hi,
        in General, including CUDA into Lucene may be a good idea, but I see no real possibility to do this inside Lucene Core or any other module. My idea would be to add some abstraction to the relevant parts of Lucene and make it easier to "plug in" different implementations. Then this code could also be hosted outside Lucene (if Licenses is a problem) anywhere on Github.

        We still should have the following in our head: Mike's example looks "simple" as a quick test if we see gains, but making the whole thing ready for commit or bundling in any project in/outside Lucene is a whole different story. Currently BooleanScorer calls a lot of classes, e.g. the BM25 similarity or TF-IDF to do the calculation that could possibly be parallelized. But for moving all this to CUDA, you have to add "plugin points" all there and change APIs completely. It is also hard to test, because none of our Jenkins servers has a GPU! Also for 0/8/15 users of Lucene, this could be a huge problem, if we add native stuff into Lucene that they may never use. Because of that it MUST BE SEPARATED from Lucene core. Completely...

        IMHO, I'd create a full new search engine like CLucene in C code if I would solely focus on GPU parallelization. The current iterator based approaches are not easy to transform or plug into CUDA...

        For the GSoc project, we should make sure to the GSoc student that this is just a project to "explore" GPU acceleration: if it brings any performance - I doubt that, because the call overhead between Java and CUDA is way too high - in contrast to Postgres where all in plain C/C++. The results would then be used to plan and investigate ways how to include that into Lucene as "plugin points" (e.g., as SPI modules).

        Show
        thetaphi Uwe Schindler added a comment - - edited Hi, in General, including CUDA into Lucene may be a good idea, but I see no real possibility to do this inside Lucene Core or any other module. My idea would be to add some abstraction to the relevant parts of Lucene and make it easier to "plug in" different implementations. Then this code could also be hosted outside Lucene (if Licenses is a problem) anywhere on Github. We still should have the following in our head: Mike's example looks "simple" as a quick test if we see gains, but making the whole thing ready for commit or bundling in any project in/outside Lucene is a whole different story. Currently BooleanScorer calls a lot of classes, e.g. the BM25 similarity or TF-IDF to do the calculation that could possibly be parallelized. But for moving all this to CUDA, you have to add "plugin points" all there and change APIs completely. It is also hard to test, because none of our Jenkins servers has a GPU! Also for 0/8/15 users of Lucene, this could be a huge problem, if we add native stuff into Lucene that they may never use. Because of that it MUST BE SEPARATED from Lucene core. Completely... IMHO, I'd create a full new search engine like CLucene in C code if I would solely focus on GPU parallelization. The current iterator based approaches are not easy to transform or plug into CUDA... For the GSoc project, we should make sure to the GSoc student that this is just a project to "explore" GPU acceleration: if it brings any performance - I doubt that, because the call overhead between Java and CUDA is way too high - in contrast to Postgres where all in plain C/C++. The results would then be used to plan and investigate ways how to include that into Lucene as "plugin points" (e.g., as SPI modules).
        Hide
        qwerty123 vikash added a comment - - edited

        Hi all, I have been reading about GPU acceleration and in particular to be precise about GPU accelerated computing I find this project very interesting and so can anyone give me further lead what is to be done now? I mean the ideas that Ishaan suggested are pretty good but I am still not able to understand that what Mr David means by (a) could whatever comes of this actually be contributed to Lucene itself, whydo you think that you doubt that the outcome of this project not be contributed to Lucene.

        Show
        qwerty123 vikash added a comment - - edited Hi all, I have been reading about GPU acceleration and in particular to be precise about GPU accelerated computing I find this project very interesting and so can anyone give me further lead what is to be done now? I mean the ideas that Ishaan suggested are pretty good but I am still not able to understand that what Mr David means by (a) could whatever comes of this actually be contributed to Lucene itself, whydo you think that you doubt that the outcome of this project not be contributed to Lucene.
        Hide
        dsmiley David Smiley added a comment -

        vikash: not all working code contributed to any open-source project is necessarily welcome. Usually it is but sometimes project members or ASF rules insist on certain things for the perceived greater good. In this case, I believe Uwe doesn't want Lucene to include anything that would only work with certain hardware or JVM vendors – even if it was optional opt-in. If hypothetically nobody had such concerns here, be aware that any 3rd party (non-ASF) libraries need to meet certain qualifications. For example, if whatever Java CUDA library you find happens to be licensed as GPL, then it's incompatible with ASF run projects like this one. That's a hypothetical; I have no idea what Java CUDA libraries exist and what their licenses are. Regardless... if you come up with something useful, it's probably not necessary that Lucene itself change, and as seen here we have some willingness to change Lucene (details TBD) if it enables people to use Lucene with CUDA. Lucene has many extension points already. Though I could imagine you might unfortunately need to copy/fork some long source files – Uwe mentioned some.

        Good luck.

        Show
        dsmiley David Smiley added a comment - vikash: not all working code contributed to any open-source project is necessarily welcome. Usually it is but sometimes project members or ASF rules insist on certain things for the perceived greater good. In this case, I believe Uwe doesn't want Lucene to include anything that would only work with certain hardware or JVM vendors – even if it was optional opt-in. If hypothetically nobody had such concerns here, be aware that any 3rd party (non-ASF) libraries need to meet certain qualifications. For example, if whatever Java CUDA library you find happens to be licensed as GPL, then it's incompatible with ASF run projects like this one. That's a hypothetical; I have no idea what Java CUDA libraries exist and what their licenses are. Regardless... if you come up with something useful, it's probably not necessary that Lucene itself change, and as seen here we have some willingness to change Lucene (details TBD) if it enables people to use Lucene with CUDA. Lucene has many extension points already. Though I could imagine you might unfortunately need to copy/fork some long source files – Uwe mentioned some. Good luck.
        Hide
        ichattopadhyaya Ishan Chattopadhyaya added a comment - - edited

        Hi Vikash,

        Regarding licensing issue:
        The work done in this project would be exploratory. That code won't necessarily go into Lucene. When we are at a point where we see clear benefits from the work done here, we would then have to explore all aspects of productionizing it (including licensing).

        Regarding next steps:

        BooleanScorer calls a lot of classes, e.g. the BM25 similarity or TF-IDF to do the calculation that could possibly be parallelized.

        1. First, understand how BooleanScorer calls these similarity classes and does the scoring. There are unit tests in Lucene that can help you get there. This might help: https://wiki.apache.org/lucene-java/HowToContribute
        2. Write a standalone CUDA/OpenCL project that does the same processing on the GPU.
        3. Benchmark the speed of doing so on GPU vs. speed observed when doing the same through the BooleanScorer. Preferably, on a large resultset. Include time for copying results and scores in and out of the device memory from/to the main memory.
        4. Optimize step 2, if possible.

        Once this is achieved (which in itself could be a sufficient GSoC project), one can have stretch goals to try out other parts of Lucene to optimize (e.g. spatial search).

        Another stretch goal, if the results for optimizations are positive, could be to integrate the solution into Lucene. Most suitable way to do so would be to create hooks into Lucene so that plugins can be built to delegate parts of the processing to external code. And then, write a plugin (that uses jCuda, for example) and do an integration test.

        Show
        ichattopadhyaya Ishan Chattopadhyaya added a comment - - edited Hi Vikash, Regarding licensing issue: The work done in this project would be exploratory. That code won't necessarily go into Lucene. When we are at a point where we see clear benefits from the work done here, we would then have to explore all aspects of productionizing it (including licensing). Regarding next steps: BooleanScorer calls a lot of classes, e.g. the BM25 similarity or TF-IDF to do the calculation that could possibly be parallelized. First, understand how BooleanScorer calls these similarity classes and does the scoring. There are unit tests in Lucene that can help you get there. This might help: https://wiki.apache.org/lucene-java/HowToContribute Write a standalone CUDA/OpenCL project that does the same processing on the GPU. Benchmark the speed of doing so on GPU vs. speed observed when doing the same through the BooleanScorer. Preferably, on a large resultset. Include time for copying results and scores in and out of the device memory from/to the main memory. Optimize step 2, if possible. Once this is achieved (which in itself could be a sufficient GSoC project), one can have stretch goals to try out other parts of Lucene to optimize (e.g. spatial search). Another stretch goal, if the results for optimizations are positive, could be to integrate the solution into Lucene. Most suitable way to do so would be to create hooks into Lucene so that plugins can be built to delegate parts of the processing to external code. And then, write a plugin (that uses jCuda, for example) and do an integration test.
        Hide
        ichattopadhyaya Ishan Chattopadhyaya added a comment -

        Java CUDA libraries exist and what their licenses

        jCuda happens to be MIT, which is, afaik, compatible with Apache license.
        http://www.jcuda.org/License.txt

        Show
        ichattopadhyaya Ishan Chattopadhyaya added a comment - Java CUDA libraries exist and what their licenses jCuda happens to be MIT, which is, afaik, compatible with Apache license. http://www.jcuda.org/License.txt
        Hide
        qwerty123 vikash added a comment -

        Hello all,
        I have been reading a lot about GPU working and GPU parallelization in particularly about General Purpose computing on Graphics Processing Units and have also looked into in detail the source code of the BooleanScorer.java file , its a nice thing and I am having no difficulty understanding its working since Java is my speciality so the job was quite fun . There are a few things that seem unclear to me but I am reading and experimenting so I will resolve them soon. It is a nice idea to use gpu to perform the search and indexing operations on a document using the GPU and that would be faster using the GPU. And regarding the licencing issue, since we are generating code and as it was said above the code that we generate may not go to Lucene necessarily so assuming this happens then will licencing still be an issue if we use the libraries in our code? And as Uwe Schindler said we may host the code on github and certainly it would not be a good idea to develop code for special hardware, but still we can give it a try and try to make it compatible with most of the hardwares. I dont mind if this code does not go to Lucene, but we can try to change lucene and make it better and I am preparing myself for it and things would stay on track with your kind mentorship .
        So should I submit my proposal now or do I need to complete all the four steps that Ishaan told to do in the last comment and then submit my proposal? And which one of the ideas would you prefer to mentor me on that is which one do you think would be a better one to continue with?

        >Copy over and index lots of points and corresponding docids to the GPU as an offline, one time operation. Then, given a query point, return top-n nearest indexed points.
        >Copy over and index lots of points and corresponding docids to the GPU as an offline, one time operation. Then, given a polygon (complex shape), return all points that lie inside the polygon.
        >Benchmarking an aggregation over a DocValues field and comparing the corresponding performance when executed on the GPU.
        >Benchmarking the speed of calculations on GPU vs. speed observed when doing the same through the BooleanScorer. Preferably, on a large result set with the time for copying results and scores in and out of the device memory from/to the main memory included?
        -Vikash

        Show
        qwerty123 vikash added a comment - Hello all, I have been reading a lot about GPU working and GPU parallelization in particularly about General Purpose computing on Graphics Processing Units and have also looked into in detail the source code of the BooleanScorer.java file , its a nice thing and I am having no difficulty understanding its working since Java is my speciality so the job was quite fun . There are a few things that seem unclear to me but I am reading and experimenting so I will resolve them soon. It is a nice idea to use gpu to perform the search and indexing operations on a document using the GPU and that would be faster using the GPU. And regarding the licencing issue, since we are generating code and as it was said above the code that we generate may not go to Lucene necessarily so assuming this happens then will licencing still be an issue if we use the libraries in our code? And as Uwe Schindler said we may host the code on github and certainly it would not be a good idea to develop code for special hardware, but still we can give it a try and try to make it compatible with most of the hardwares. I dont mind if this code does not go to Lucene, but we can try to change lucene and make it better and I am preparing myself for it and things would stay on track with your kind mentorship . So should I submit my proposal now or do I need to complete all the four steps that Ishaan told to do in the last comment and then submit my proposal? And which one of the ideas would you prefer to mentor me on that is which one do you think would be a better one to continue with? >Copy over and index lots of points and corresponding docids to the GPU as an offline, one time operation. Then, given a query point, return top-n nearest indexed points. >Copy over and index lots of points and corresponding docids to the GPU as an offline, one time operation. Then, given a polygon (complex shape), return all points that lie inside the polygon. >Benchmarking an aggregation over a DocValues field and comparing the corresponding performance when executed on the GPU. >Benchmarking the speed of calculations on GPU vs. speed observed when doing the same through the BooleanScorer. Preferably, on a large result set with the time for copying results and scores in and out of the device memory from/to the main memory included? -Vikash
        Hide
        ichattopadhyaya Ishan Chattopadhyaya added a comment -

        Hi Vikash,
        I suggest you read the student manuals for GSoC.
        Submit a proposal how you want to approach this project, including technical details (as much as possible) and detailed timelines.

        Regarding the following:

        1    First, understand how BooleanScorer calls these similarity classes and does the scoring. There are unit tests in Lucene that can help you get there. This might help: https://wiki.apache.org/lucene-java/HowToContribute
        2    Write a standalone CUDA/OpenCL project that does the same processing on the GPU.
        3    Benchmark the speed of doing so on GPU vs. speed observed when doing the same through the BooleanScorer. Preferably, on a large resultset. Include time for copying results and scores in and out of the device memory from/to the main memory.
         4   Optimize step 2, if possible.
        

        If you've already understood step 1, feel free to make a proposal on how you will use your GSoC coding time to achieve steps 2-4. Also, you can look at other stretch goals to be included in the coding time. I would consider that steps 2-4, if done properly and successfully, is itself a good GSoC contribution. And if these steps are done properly, then either Lucene integration can be proposed for the latter part of the coding phase (last 2-3 weeks, I'd think), or exploratory work on other part of Lucene (apart from the BooleanScorer, e.g. spatial search filtering etc.) could be taken up.

        Time is running out, so kindly submit a proposal as soon as possible. You can submit a draft first, have one of us review it and then submit it as final after the review. If the deadline is too close, there might not be enough time for this round of review, and in such a case just submit the draft as final.

        Also, remember a lot of the GPGPU coding is done on C, so familiarity/experience with that is a plus.

        (Just a suggestion that makes sense to me, and feel free to ignore: bullet points work better than long paragraphs, even though the length of sentences can remain the same)

        Show
        ichattopadhyaya Ishan Chattopadhyaya added a comment - Hi Vikash, I suggest you read the student manuals for GSoC. Submit a proposal how you want to approach this project, including technical details (as much as possible) and detailed timelines. Regarding the following: 1 First, understand how BooleanScorer calls these similarity classes and does the scoring. There are unit tests in Lucene that can help you get there. This might help: https: //wiki.apache.org/lucene-java/HowToContribute 2 Write a standalone CUDA/OpenCL project that does the same processing on the GPU. 3 Benchmark the speed of doing so on GPU vs. speed observed when doing the same through the BooleanScorer. Preferably, on a large resultset. Include time for copying results and scores in and out of the device memory from/to the main memory. 4 Optimize step 2, if possible. If you've already understood step 1, feel free to make a proposal on how you will use your GSoC coding time to achieve steps 2-4. Also, you can look at other stretch goals to be included in the coding time. I would consider that steps 2-4, if done properly and successfully, is itself a good GSoC contribution. And if these steps are done properly, then either Lucene integration can be proposed for the latter part of the coding phase (last 2-3 weeks, I'd think), or exploratory work on other part of Lucene (apart from the BooleanScorer, e.g. spatial search filtering etc.) could be taken up. Time is running out, so kindly submit a proposal as soon as possible. You can submit a draft first, have one of us review it and then submit it as final after the review. If the deadline is too close, there might not be enough time for this round of review, and in such a case just submit the draft as final. Also, remember a lot of the GPGPU coding is done on C, so familiarity/experience with that is a plus. (Just a suggestion that makes sense to me, and feel free to ignore: bullet points work better than long paragraphs, even though the length of sentences can remain the same)
        Hide
        qwerty123 vikash added a comment -

        Yeah I had already read the student manual and the deadline is 3rd April and its too close, in the preparation I had almost missed the deadline for application. OK so for the proposal my draft is here (you may comment on it and I will do the needful)

        https://docs.google.com/document/d/1HGxU1ZudNdAboj0s9WKTWJk3anbZm86JY-abaflXoEI/edit?usp=sharing .

        Show
        qwerty123 vikash added a comment - Yeah I had already read the student manual and the deadline is 3rd April and its too close, in the preparation I had almost missed the deadline for application. OK so for the proposal my draft is here (you may comment on it and I will do the needful) https://docs.google.com/document/d/1HGxU1ZudNdAboj0s9WKTWJk3anbZm86JY-abaflXoEI/edit?usp=sharing .
        Hide
        ichattopadhyaya Ishan Chattopadhyaya added a comment -

        I have left initial comments on your draft. Let me know if you want another round of review, perhaps after you've addressed the current comments.

        Show
        ichattopadhyaya Ishan Chattopadhyaya added a comment - I have left initial comments on your draft. Let me know if you want another round of review, perhaps after you've addressed the current comments.
        Hide
        qwerty123 vikash added a comment -

        Hi Ishaan ,
        I have changed the proposal according to your instructions, can you review it again?

        Show
        qwerty123 vikash added a comment - Hi Ishaan , I have changed the proposal according to your instructions, can you review it again?
        Hide
        ichattopadhyaya Ishan Chattopadhyaya added a comment -

        Hi Vikash,
        I have reviewed the proposal. It is still extremely disorganized and it is not clear what your goals are and how you have split them up into tasks. It contains lots of copy paste of comments/statements from this JIRA or comments from the proposal itself. The level of details still seems inadequate to me.

        I had proposed a possible way to structure your proposal (by splitting the three months into three different areas of focus, all of them I specified in the comments), but I don't see that you've done so. I asked you to find out, at least, what the default Similarity in Lucene is called (and to attempt to simulate the scoring for that on the GPU). It seems you have not done so.

        At this point, I don't think much can be done (just 2 hours to go for submission). Wish you the best.

        Show
        ichattopadhyaya Ishan Chattopadhyaya added a comment - Hi Vikash, I have reviewed the proposal. It is still extremely disorganized and it is not clear what your goals are and how you have split them up into tasks. It contains lots of copy paste of comments/statements from this JIRA or comments from the proposal itself. The level of details still seems inadequate to me. I had proposed a possible way to structure your proposal (by splitting the three months into three different areas of focus, all of them I specified in the comments), but I don't see that you've done so. I asked you to find out, at least, what the default Similarity in Lucene is called (and to attempt to simulate the scoring for that on the GPU). It seems you have not done so. At this point, I don't think much can be done (just 2 hours to go for submission). Wish you the best.
        Hide
        qwerty123 vikash added a comment -

        It is my First GSOC and so it was a bit difficult for me to draft the proposal properly.

        Show
        qwerty123 vikash added a comment - It is my First GSOC and so it was a bit difficult for me to draft the proposal properly.
        Hide
        ichattopadhyaya Ishan Chattopadhyaya added a comment -

        If you still haven't submitted your proposal, I have an idea for you to improve your chances.
        Include a link to a github repository in the application for your initial experiments.
        After that, you can try to build a prototype in the next few days (until assessment starts) that demonstrates that you are on the right track. This is not strictly necessary, but just throwing out an idea that might benefit you.

        All the best and regards!

        Show
        ichattopadhyaya Ishan Chattopadhyaya added a comment - If you still haven't submitted your proposal, I have an idea for you to improve your chances. Include a link to a github repository in the application for your initial experiments. After that, you can try to build a prototype in the next few days (until assessment starts) that demonstrates that you are on the right track. This is not strictly necessary, but just throwing out an idea that might benefit you. All the best and regards!
        Hide
        qwerty123 vikash added a comment - - edited

        oops i could not do that, i submitted my proposal, and if you check it now the latest edited format is the submitted version I made some changes to it again before submitting it and sadly i could not change the github link, it only points to my home directory in github, but can I start working still now and I shall give you the link that has my working and if it would be possible for you , you could show to the Apache Software Foundation my works, will that be ok?
        And since as I have said in my proposal that I will work from april itself so I will do some working and so will the repository I build for lucene and work I store there be checked by ASF by visiting my profile and navigating to the lucene repository i create there? can that help me increase my chances?
        And by whom will my proposal be checked?

        Show
        qwerty123 vikash added a comment - - edited oops i could not do that, i submitted my proposal, and if you check it now the latest edited format is the submitted version I made some changes to it again before submitting it and sadly i could not change the github link, it only points to my home directory in github, but can I start working still now and I shall give you the link that has my working and if it would be possible for you , you could show to the Apache Software Foundation my works, will that be ok? And since as I have said in my proposal that I will work from april itself so I will do some working and so will the repository I build for lucene and work I store there be checked by ASF by visiting my profile and navigating to the lucene repository i create there? can that help me increase my chances? And by whom will my proposal be checked?

          People

          • Assignee:
            Unassigned
            Reporter:
            ichattopadhyaya Ishan Chattopadhyaya
          • Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

            • Created:
              Updated:

              Development