Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0-ALPHA
    • Component/s: search
    • Labels:
      None

      Description

      Limited join functionality for Solr, mapping one set of IDs matching a query to another set of IDs, based on the indexed tokens of the fields.
      Example:
      fq=

      {!join from=parent_ptr to:parent_id}

      child_doc:query

      1. SOLR-2272.patch
        58 kB
        Yonik Seeley
      2. SOLR-2272.patch
        43 kB
        Yonik Seeley
      3. SOLR-2272.patch
        46 kB
        Yonik Seeley

        Activity

        Hide
        Suryansh Purwar added a comment -

        I'm not getting expected results. Will it be a problem if number of results returned from "from" query are too many?
        something around 1000 let's say.

        Show
        Suryansh Purwar added a comment - I'm not getting expected results. Will it be a problem if number of results returned from "from" query are too many? something around 1000 let's say.
        Hide
        Christopher Ball added a comment -

        This does not appear to support use with a delete query . . .

        For example, the following does not work:

        http://localhost:8984/solr/myMusic/update?stream.body=<delete><query>

        {!join from=artist_name to=artist_name fromIndex=MusicBrainz}

        :</query></delete>&commit=true

        Show
        Christopher Ball added a comment - This does not appear to support use with a delete query . . . For example, the following does not work: http://localhost:8984/solr/myMusic/update?stream.body= <delete><query> {!join from=artist_name to=artist_name fromIndex=MusicBrainz} : </query></delete>&commit=true
        Hide
        C S added a comment -

        Is there a chance that this will make it into the 3.x branch?

        Show
        C S added a comment - Is there a chance that this will make it into the 3.x branch?
        Hide
        Bill Bell added a comment -

        IS this committed yes or no?

        If yes, can we make it work for 3.x ?

        Show
        Bill Bell added a comment - IS this committed yes or no? If yes, can we make it work for 3.x ?
        Hide
        Greg Stein added a comment -

        The only technical reason to revert somebody else's commit is if it break's the build, and there is no obvious fix to make it build again (ie. the revert re-enables the project to move forward). There is absolutely NO possible technical reason to perform a revert of somebody's commit which merely adds a feature. If you think there is, then I have serious concerns about your idea of how things should operate here at the Foundation. (Robert may object to a change... totally fine, especially if there are technical concerns, and he may work to see it backed out or fixed or altered... but he should never unilaterally revert it himself just because he disagrees or vetoes; the original committer should say "I see your veto, and I don't see us reaching any compromise, so I'll back it out now")

        Show
        Greg Stein added a comment - The only technical reason to revert somebody else's commit is if it break's the build, and there is no obvious fix to make it build again (ie. the revert re-enables the project to move forward). There is absolutely NO possible technical reason to perform a revert of somebody's commit which merely adds a feature. If you think there is, then I have serious concerns about your idea of how things should operate here at the Foundation. (Robert may object to a change... totally fine, especially if there are technical concerns, and he may work to see it backed out or fixed or altered... but he should never unilaterally revert it himself just because he disagrees or vetoes; the original committer should say "I see your veto, and I don't see us reaching any compromise, so I'll back it out now")
        Hide
        Grant Ingersoll added a comment -

        I'm not saying it was right to revert etc but I do believe both Yonik and Robert had technical reasons for what they did, even if the solution they arrived at was too drastic.

        Show
        Grant Ingersoll added a comment - I'm not saying it was right to revert etc but I do believe both Yonik and Robert had technical reasons for what they did, even if the solution they arrived at was too drastic.
        Hide
        Robert Muir added a comment -

        Its the same argument we have here, all over again. If we could actually freely refactor stuff from solr into lucene (e.g. spatial search functionality, which seems like some lucene users MIGHT just want), then it really doesn't matter where it goes.

        Sure its "ideal" if the patch puts the core functionality in lucene and the solr bits in solr, but it would still be "great" even if the patch starts totally solr, and there is at least the hope in the future of refactoring (if someone is willing to do the work).

        Show
        Robert Muir added a comment - Its the same argument we have here, all over again. If we could actually freely refactor stuff from solr into lucene (e.g. spatial search functionality, which seems like some lucene users MIGHT just want), then it really doesn't matter where it goes. Sure its "ideal" if the patch puts the core functionality in lucene and the solr bits in solr, but it would still be "great" even if the patch starts totally solr, and there is at least the hope in the future of refactoring (if someone is willing to do the work).
        Hide
        Yonik Seeley added a comment - - edited

        Theoretically refactorings are the kinds of things that newer contributors could be helping with but currently you have to put on a full suit of body armor and prepare to do "battle".

        There are hypothetical scenarios which you imagine like that... and then
        there are real scenarios where you discourage contributions to Solr that
        do not directly benefit Lucene.

        http://markmail.org/message/hpvkrqe5ap3vjuci

        Show
        Yonik Seeley added a comment - - edited Theoretically refactorings are the kinds of things that newer contributors could be helping with but currently you have to put on a full suit of body armor and prepare to do "battle". There are hypothetical scenarios which you imagine like that... and then there are real scenarios where you discourage contributions to Solr that do not directly benefit Lucene. http://markmail.org/message/hpvkrqe5ap3vjuci
        Hide
        Robert Muir added a comment -

        Hi Mark, there is at least one other comment on the general@ "vote" thread that feels the project is somehow dominated by one person: there are at least more people (than just me) that feel its not a balanced community.

        As far as refactorings go, its not very sexy stuff and we have to move past battling over it constantly. Theoretically refactorings are the kinds of things that newer contributors could be helping with but currently you have to put on a full suit of body armor and prepare to do "battle".

        Show
        Robert Muir added a comment - Hi Mark, there is at least one other comment on the general@ "vote" thread that feels the project is somehow dominated by one person: there are at least more people (than just me) that feel its not a balanced community. As far as refactorings go, its not very sexy stuff and we have to move past battling over it constantly. Theoretically refactorings are the kinds of things that newer contributors could be helping with but currently you have to put on a full suit of body armor and prepare to do "battle".
        Hide
        Greg Stein added a comment -

        Oh, and as for "bullshit vetoes". Yeah. Look at Yonik's in LUCENE-2995 and Richard's in this issue. I call both of those bullshit. (and then Robert stepped way over the line with his double-revert) ... Those were clearly bullshit rather than a simple disagreement.

        Show
        Greg Stein added a comment - Oh, and as for "bullshit vetoes". Yeah. Look at Yonik's in LUCENE-2995 and Richard's in this issue. I call both of those bullshit. (and then Robert stepped way over the line with his double-revert) ... Those were clearly bullshit rather than a simple disagreement.
        Hide
        Greg Stein added a comment -

        Grant: what glass house am I in? And when did this conversation become about me?

        I am hearing opposing things here. How can I tell who is telling the truth, who is shading the truth, and who is outright lying? Answer: I can't. So instead, I say "you have a problem, and you need to get it fixed." If conversations reach this kind of rancor, then you have a problem. Simple as that.

        Show
        Greg Stein added a comment - Grant: what glass house am I in? And when did this conversation become about me? I am hearing opposing things here. How can I tell who is telling the truth, who is shading the truth, and who is outright lying? Answer: I can't. So instead, I say "you have a problem, and you need to get it fixed." If conversations reach this kind of rancor, then you have a problem. Simple as that.
        Hide
        Mark Miller added a comment -

        Some background:

        The PMC should have stood up at that time and shut down those private conversations. That is a failing of the PMC.

        The PMC uniformly responded (those who responded) that they would not discuss off list. A couple committers sent emails in response saying they would like to be heard but where not on the PMC list. That was actually settled about as nicely as you could ask IMO. It's then up to any of the individuals to move the discussion to public mailings lists. This is not a new discussion - it's not something I personally was going to open up again - its really up to those with larger stakes in the game to make this move - if someone is ready and willing to discuss, lets go - but the emails where not there until now.

        Also, no one has ever banned refactorings. The idea is absurd. I have heard yonik more than once suggest that we stop merge issues until we figure this out. This is not a declaration on what you can do Robert - and when you get one of those emails asking for your bank password, I wouldn't listen to it either. You are a PMC member - you know the Apache way - don't play these games about being banned - the closest I have seen to that kind of nonesense is this revert war!

        Show
        Mark Miller added a comment - Some background: The PMC should have stood up at that time and shut down those private conversations. That is a failing of the PMC. The PMC uniformly responded (those who responded) that they would not discuss off list. A couple committers sent emails in response saying they would like to be heard but where not on the PMC list. That was actually settled about as nicely as you could ask IMO. It's then up to any of the individuals to move the discussion to public mailings lists. This is not a new discussion - it's not something I personally was going to open up again - its really up to those with larger stakes in the game to make this move - if someone is ready and willing to discuss, lets go - but the emails where not there until now. Also, no one has ever banned refactorings. The idea is absurd. I have heard yonik more than once suggest that we stop merge issues until we figure this out. This is not a declaration on what you can do Robert - and when you get one of those emails asking for your bank password, I wouldn't listen to it either. You are a PMC member - you know the Apache way - don't play these games about being banned - the closest I have seen to that kind of nonesense is this revert war!
        Hide
        Grant Ingersoll added a comment -

        There are technical reasons, and they aren't necessarily bullshit, it's just that not everyone agrees on them. If you would like, we can link all the back issues. As Yonik has pointed out many times, factoring these things out makes it harder on some parts of the code and while others have pointed out it makes it better for other parts. While I believe it is a net positive despite the downsides, it isn't always cut and dried.

        As for the private conversations, not all of us are privy to them, so how does the PMC shut them down? Besides, people who live in glass houses, shouldn't throw stones.

        Show
        Grant Ingersoll added a comment - There are technical reasons, and they aren't necessarily bullshit, it's just that not everyone agrees on them. If you would like, we can link all the back issues. As Yonik has pointed out many times, factoring these things out makes it harder on some parts of the code and while others have pointed out it makes it better for other parts. While I believe it is a net positive despite the downsides, it isn't always cut and dried. As for the private conversations, not all of us are privy to them, so how does the PMC shut them down? Besides, people who live in glass houses, shouldn't throw stones.
        Hide
        Greg Stein added a comment -

        The PMC should have stood up at that time and shut down those private conversations. That is a failing of the PMC.

        Refactorings should be quite allowed, as all the code is under the purview of the PMC. It isn't "Lucene" and "Solr" ... it is lucene.apache.org. The PMC is responsible for the whole thing. If somebody takes issue with a refactoring, then they better have a damned good TECHNICAL reason to attempt a veto of it. And the PMC should be slapping down people who make repeated bullshit vetoes (again: remove their PMC membership and commit rights until they've learned to work with others).

        If you're not satisfied with the operation of the PMC, then raise it to the Board. That is what we are here for: to fix the PMC when it is broken. And as I've said: our fixes are usually pretty extreme, so you better try and work it out yourself first. Get your policy reset on those kinds of changes. Start an explicit conversation IN PUBLIC, rather than allowing yourself to be driven by crappy private bullshit. Shine a light on this stuff, or you're going to be stuck. And it'll be your own fault.

        Show
        Greg Stein added a comment - The PMC should have stood up at that time and shut down those private conversations. That is a failing of the PMC. Refactorings should be quite allowed, as all the code is under the purview of the PMC. It isn't "Lucene" and "Solr" ... it is lucene.apache.org. The PMC is responsible for the whole thing. If somebody takes issue with a refactoring, then they better have a damned good TECHNICAL reason to attempt a veto of it. And the PMC should be slapping down people who make repeated bullshit vetoes (again: remove their PMC membership and commit rights until they've learned to work with others). If you're not satisfied with the operation of the PMC, then raise it to the Board. That is what we are here for: to fix the PMC when it is broken. And as I've said: our fixes are usually pretty extreme, so you better try and work it out yourself first. Get your policy reset on those kinds of changes. Start an explicit conversation IN PUBLIC, rather than allowing yourself to be driven by crappy private bullshit. Shine a light on this stuff, or you're going to be stuck. And it'll be your own fault.
        Hide
        Robert Muir added a comment -

        I'd really like to see it go in there, to be
        refactored to lucene at a later date when/if someone else has
        time/interest

        Jonathan: as would I, but this is currently impossible, as refactoring from solr to lucene is "not allowed".

        I'm sorry that you aren't able to view these discussions (nearly a month ago), because the people dictating these things refused to send them to a public mailing list, but the fact is, currently once the code goes in it cannot be moved.

        All this being said, I have no choice but to retract my objection (as Greg pointed out, its completely incorrect). But I really do not like the fact that major decisions for this project are being dictated on private emails like this and nobody is doing anything about it.

        Show
        Robert Muir added a comment - I'd really like to see it go in there, to be refactored to lucene at a later date when/if someone else has time/interest Jonathan: as would I, but this is currently impossible, as refactoring from solr to lucene is "not allowed". I'm sorry that you aren't able to view these discussions (nearly a month ago), because the people dictating these things refused to send them to a public mailing list, but the fact is, currently once the code goes in it cannot be moved. All this being said, I have no choice but to retract my objection (as Greg pointed out, its completely incorrect). But I really do not like the fact that major decisions for this project are being dictated on private emails like this and nobody is doing anything about it.
        Hide
        Mark Miller added a comment -

        Thanks Jonathan - very useful feedback. Appreciate your well thought out response and Solr user perspective on this particular issue. Sorry we have to air our dirty laundry all over the place. Transparency is what it is

        Show
        Mark Miller added a comment - Thanks Jonathan - very useful feedback. Appreciate your well thought out response and Solr user perspective on this particular issue. Sorry we have to air our dirty laundry all over the place. Transparency is what it is
        Hide
        Jonathan Rochkind added a comment -

        This is a feature that would be very useful to me, thus I've had a watch
        on the ticket.

        I am not a solr committer and have never even contributed a patch, just
        a user. But this is a feature I've been wanting for a while, and would
        be excited to see it in Solr. On the one hand I see the argument that
        it would ideally go at the lucene layer; on the other hand, if someone
        has working, tested, well-written code that is is ready at the Solr
        layer, as a user, I'd really like to see it go in there, to be
        refactored to lucene at a later date when/if someone else has
        time/interest – rather than delaying working code indefinitely for
        hypothetical future (potentially time-consuming) refactoring at a
        different layer by yet-to-be-volunteered labor.

        I put this in only to make clear that there are users in the general
        population who would like to see this feature get into Solr sooner
        rather than later, delaying it does matter to 'ordinary' users. (I am
        neither a Solr contractor nor a customer of a Solr contracter, I don't
        have any such 'business interests', I am just an in-house developer who
        writes (open source) software on top of Solr. Not that there's anything
        wrong with being a Solr contractor, it's a fine way to fund open source
        development and I think it's odd to imply there's something wrong with
        it; I'm just saying I'm not one.)

        Hopefully you committers can work it out amongst yourself in a way that
        balances the codebase's architectural health with the need to get
        working code out there.

        Show
        Jonathan Rochkind added a comment - This is a feature that would be very useful to me, thus I've had a watch on the ticket. I am not a solr committer and have never even contributed a patch, just a user. But this is a feature I've been wanting for a while, and would be excited to see it in Solr. On the one hand I see the argument that it would ideally go at the lucene layer; on the other hand, if someone has working, tested, well-written code that is is ready at the Solr layer, as a user, I'd really like to see it go in there, to be refactored to lucene at a later date when/if someone else has time/interest – rather than delaying working code indefinitely for hypothetical future (potentially time-consuming) refactoring at a different layer by yet-to-be-volunteered labor. I put this in only to make clear that there are users in the general population who would like to see this feature get into Solr sooner rather than later, delaying it does matter to 'ordinary' users. (I am neither a Solr contractor nor a customer of a Solr contracter, I don't have any such 'business interests', I am just an in-house developer who writes (open source) software on top of Solr. Not that there's anything wrong with being a Solr contractor, it's a fine way to fund open source development and I think it's odd to imply there's something wrong with it; I'm just saying I'm not one.) Hopefully you committers can work it out amongst yourself in a way that balances the codebase's architectural health with the need to get working code out there.
        Hide
        Greg Stein added a comment -

        Rather than discuss this in JIRA, I've started a thread to the PMC and the Board.

        Show
        Greg Stein added a comment - Rather than discuss this in JIRA, I've started a thread to the PMC and the Board.
        Hide
        Greg Stein added a comment -

        How about getting back to the issue at hand: improper vetoes (the implied ones here, and the Bad one that was attempted in LUCENE-2995), and unilateral reverts. The PMC needs to take action to fix the way this community operates.

        I will also state this for the record, since I've heard it from numerous people: the PMC should also ensure that this project is not controlled by any special business interests. If this friction is because of undue commercial influence, then the Board is going to be very reticent to take actions such as splitting it up. Board actions for bad PMC operation is, shall we say, a LOT more drastic.

        Show
        Greg Stein added a comment - How about getting back to the issue at hand: improper vetoes (the implied ones here, and the Bad one that was attempted in LUCENE-2995 ), and unilateral reverts. The PMC needs to take action to fix the way this community operates. I will also state this for the record, since I've heard it from numerous people: the PMC should also ensure that this project is not controlled by any special business interests. If this friction is because of undue commercial influence, then the Board is going to be very reticent to take actions such as splitting it up. Board actions for bad PMC operation is, shall we say, a LOT more drastic.
        Hide
        Mark Miller added a comment -

        lets agree to be confused you mean?

        Show
        Mark Miller added a comment - lets agree to be confused you mean?
        Hide
        Chris A. Mattmann added a comment -

        Actually no it's the opposite. Your community includes contributors not simply those with the commit bit.

        Show
        Chris A. Mattmann added a comment - Actually no it's the opposite. Your community includes contributors not simply those with the commit bit.
        Hide
        Mark Miller added a comment -

        I think you are confused - it's CTR for committers not contributors. You must have a committer willing to commit the code for you.

        Show
        Mark Miller added a comment - I think you are confused - it's CTR for committers not contributors. You must have a committer willing to commit the code for you.
        Hide
        Chris A. Mattmann added a comment -

        "Offically" Lucene/Solr may be CTR but not in practice. One need only look at field collapsing as evidence of this. Others have also noticed this over the years.

        Show
        Chris A. Mattmann added a comment - "Offically" Lucene/Solr may be CTR but not in practice. One need only look at field collapsing as evidence of this. Others have also noticed this over the years.
        Hide
        Mark Miller added a comment -

        Greg -

        Actually, we are officially Commit-Then-Review FWIW. However, Lucene has a history of being non agressive here - and giving time for others to review patches. I think in general we have a great system for that. It's the more heated non technical issues that would likely boil up regardless of CTR or RTC that cause issue.

        Show
        Mark Miller added a comment - Greg - Actually, we are officially Commit-Then-Review FWIW. However, Lucene has a history of being non agressive here - and giving time for others to review patches. I think in general we have a great system for that. It's the more heated non technical issues that would likely boil up regardless of CTR or RTC that cause issue.
        Hide
        Chris A. Mattmann added a comment -

        Big +1 on CTR - if you're a commiter you've earned the trust.

        Show
        Chris A. Mattmann added a comment - Big +1 on CTR - if you're a commiter you've earned the trust.
        Hide
        Greg Stein added a comment -

        Personally, without my Director's hat on, I would recommend the project consider moving to Commit-Then-Review. Right now, you spend all this effort in posting patches to JIRA, discussing it ad nauseum, and the project just sits there. If... instead... you trust the committers to move the project forward, then let them commit. WHEN a problem comes up then, you discuss it. RTC implicitly says "we don't trust you", and I don't think it is a good model for project development. I've seen lots of people say "but the code is really tricky, so we need to review changes to ensure stability", but I think that is just a sham for people wanting control. You can always review what has been committed and apply further patches to fix stability – it doesn't have to come before the original commit with an updated patch. Release branches are typically in RTC, and the stability over an open trunk can be regained before release time. (of course, you would also hope test and regression suites will everything stable during the open trunk)

        Show
        Greg Stein added a comment - Personally, without my Director's hat on, I would recommend the project consider moving to Commit-Then-Review. Right now, you spend all this effort in posting patches to JIRA, discussing it ad nauseum, and the project just sits there. If... instead... you trust the committers to move the project forward, then let them commit. WHEN a problem comes up then, you discuss it. RTC implicitly says "we don't trust you", and I don't think it is a good model for project development. I've seen lots of people say "but the code is really tricky, so we need to review changes to ensure stability", but I think that is just a sham for people wanting control . You can always review what has been committed and apply further patches to fix stability – it doesn't have to come before the original commit with an updated patch. Release branches are typically in RTC, and the stability over an open trunk can be regained before release time. (of course, you would also hope test and regression suites will everything stable during the open trunk)
        Hide
        Mark Miller added a comment -

        Ah - i jumped the gun on you. That is useful advice (call me biased if you'd like).

        Honestly, there is no way around this issue - of course we must discuss and come to some solution regarding Lucene/Solr at this point. This whole thing is very distressing to me. I think we all know it just sucks. In the past, I have enjoyed working with everyone here. My personal beef in this issue is not with that though - I simply am very offended by heavy handed reverts and such. I feel the same way about JIRA. Words before action in my book - many words - hard as that can be for all of us sometimes. This has been the history of Lucene and the thing about the project that I admired most - heavy hands where checked at the door. I'm no angel, and I'm not perfect about this myself. But I still try and police it because I think its important.

        Show
        Mark Miller added a comment - Ah - i jumped the gun on you. That is useful advice (call me biased if you'd like). Honestly, there is no way around this issue - of course we must discuss and come to some solution regarding Lucene/Solr at this point. This whole thing is very distressing to me. I think we all know it just sucks. In the past, I have enjoyed working with everyone here. My personal beef in this issue is not with that though - I simply am very offended by heavy handed reverts and such. I feel the same way about JIRA. Words before action in my book - many words - hard as that can be for all of us sometimes. This has been the history of Lucene and the thing about the project that I admired most - heavy hands where checked at the door. I'm no angel, and I'm not perfect about this myself. But I still try and police it because I think its important.
        Hide
        Greg Stein added a comment -

        (stupid thing ended that comment early)

        First: you should not be allowed to veto a feature addition. There is no problem with that. If you don't like the location, then apply further patches t move it. But you don't stop it a prior I.

        Second: you never, NEVER revert somebody else's commit. You only do that if they drop off the face of the earth. A veto on a commit is the beginning of a discussion, and a bkocker for release. You have until then to reach consensus. The committee may realize his commit was bad for the side, and revert it himself. Or the group will find a solution, and the right (additional) patches will be applied. But you NEVER take matters into your own hands, unilaterally. If somebody repeated that action, I would ask the PMC to remove their commit rights, and the PMC damned well better recognize the anto-social and anti-project view of that committee and remove the rights. As a Director, I would absolutely support such action.

        Show
        Greg Stein added a comment - (stupid thing ended that comment early) First: you should not be allowed to veto a feature addition. There is no problem with that. If you don't like the location, then apply further patches t move it. But you don't stop it a prior I. Second: you never, NEVER revert somebody else's commit. You only do that if they drop off the face of the earth. A veto on a commit is the beginning of a discussion, and a bkocker for release . You have until then to reach consensus. The committee may realize his commit was bad for the side, and revert it himself. Or the group will find a solution, and the right (additional) patches will be applied. But you NEVER take matters into your own hands, unilaterally. If somebody repeated that action, I would ask the PMC to remove their commit rights, and the PMC damned well better recognize the anto-social and anti-project view of that committee and remove the rights. As a Director, I would absolutely support such action.
        Hide
        Mark Miller added a comment -

        You guys need to fix yourself.

        Thats some useful advice right there...

        Show
        Mark Miller added a comment - You guys need to fix yourself. Thats some useful advice right there...
        Hide
        Greg Stein added a comment -

        You guys need to fix yourself. And I'm not sure the Board is going to simply create a new TLP because you guys don't know how to operate correctly. That's just pushing around the deck chairs. The same fundamental "doesn't play with others" will not have been fixed.

        Show
        Greg Stein added a comment - You guys need to fix yourself. And I'm not sure the Board is going to simply create a new TLP because you guys don't know how to operate correctly. That's just pushing around the deck chairs. The same fundamental "doesn't play with others" will not have been fixed.
        Hide
        Yonik Seeley added a comment -

        Robert reverted my last commit again.
        This is intolerable.

        I've called a vote to spin of Solr to it's own TLP on general@l.a.o

        Show
        Yonik Seeley added a comment - Robert reverted my last commit again. This is intolerable. I've called a vote to spin of Solr to it's own TLP on general@l.a.o
        Hide
        Michael McCandless added a comment -

        But if this is true, then it must also be true that refactoring is possible, and not just in theory.

        Exactly.

        So, Yonik: will you allow this Join code to be refactored to a shared Lucene/Solr module in the future?

        Show
        Michael McCandless added a comment - But if this is true, then it must also be true that refactoring is possible, and not just in theory. Exactly. So, Yonik: will you allow this Join code to be refactored to a shared Lucene/Solr module in the future?
        Hide
        Steve Rowe added a comment -

        Mike M. wrote:

        for healthy Apache projects, where code can be freely refactored over time, it doesn't matter much where the initial commit goes. Progress not perfection...

        +1

        But it has become evident, recently, that efforts to refactor Lucene/Solr sources to their natural places are in fact strongly resisted

        I think Simon nailed it on the head: Lucene & Solr a one-way street?. I find it difficult to defend this relationship, which was supposed to be symbiotic (benefitting both Solr and Lucene), but which increasingly looks parasitic (i.e. most benefits accrue to Solr, and most costs are borne by Lucene).

        I agree with Mark: Robert's veto is political, in that it is based solely on a question of policy. I believe that Robert means to directly force re-examination of the Solr/Lucene merger, and vetoing this issue is a means to that end.

        Yonik wrote:

        It's simply preposterous that an improvement to Solr be blocked because some might want a lucene-usable module too.

        +1

        But if this is true, then it must also be true that refactoring is possible, and not just in theory.

        Show
        Steve Rowe added a comment - Mike M. wrote: for healthy Apache projects, where code can be freely refactored over time, it doesn't matter much where the initial commit goes. Progress not perfection... +1 But it has become evident, recently, that efforts to refactor Lucene/Solr sources to their natural places are in fact strongly resisted I think Simon nailed it on the head: Lucene & Solr a one-way street? . I find it difficult to defend this relationship, which was supposed to be symbiotic (benefitting both Solr and Lucene), but which increasingly looks parasitic (i.e. most benefits accrue to Solr, and most costs are borne by Lucene). I agree with Mark: Robert's veto is political, in that it is based solely on a question of policy. I believe that Robert means to directly force re-examination of the Solr/Lucene merger, and vetoing this issue is a means to that end. Yonik wrote: It's simply preposterous that an improvement to Solr be blocked because some might want a lucene-usable module too. +1 But if this is true, then it must also be true that refactoring is possible, and not just in theory.
        Hide
        Yonik Seeley added a comment -

        Sorry, I can't entertain this nonsense.
        Lucene and Solr merged as equal projects. The domain of neither project shrunk. There is no "right place" for this code. People could have done the work with their lucene hat on or their solr hat on.

        The history behind this particular patch is that it was for a customer using Solr - and I scoped the time based on that.

        It's simply preposterous that an improvement to Solr be blocked because some might want a lucene-usable module too. It's just as preposterous to block a lucene improvement because solr can't use it yet. If someone wants to do something for lucene users, fine. But that currently has no bearing on this patch.

        Show
        Yonik Seeley added a comment - Sorry, I can't entertain this nonsense. Lucene and Solr merged as equal projects. The domain of neither project shrunk. There is no "right place" for this code. People could have done the work with their lucene hat on or their solr hat on. The history behind this particular patch is that it was for a customer using Solr - and I scoped the time based on that. It's simply preposterous that an improvement to Solr be blocked because some might want a lucene-usable module too. It's just as preposterous to block a lucene improvement because solr can't use it yet. If someone wants to do something for lucene users, fine. But that currently has no bearing on this patch.
        Hide
        Michael McCandless added a comment -

        Now that you have thrown down the gauntlet, I do want to rise to the challenge, but I don't know what it is!

        +1 this is hilarious Nomatter what the gauntlet is, let's all rise to the occassion!

        Final location of code is not a good or valid reason to block commits IMO. That can be improved over time, when people back up there fricken wants with some code rather than road blocks. This is a political revert request - though I'm sure you will now try and twist it to some technical BS.

        I generally agree with this, ie, for healthy Apache projects, where
        code can be freely refactored over time, it doesn't matter much where
        the initial commit goes. Progress not perfection...

        But it has become evident, recently, that efforts to refactor
        Lucene/Solr sources to their natural places are in fact strongly
        resisted (eg, Yonik has now vetoed LUCENE-2995, which looked like a
        great example of such a refactoring; LUCENE-2883 also met with
        resistance).

        Ie it now seems like we should try hard(er) to put new code in the
        right place, up front, because we are in fact not really free to later
        refactor it.

        So... here's an idea: how about if someone volunteered to take Yonik's
        latest patch here and make it a shared module, before committing?
        Yonik (and everyone else), would you be OK with that? (I'm not sure
        anyone would volunteer now, but, it seems like this would be a way out
        of this "impasse"...).

        Show
        Michael McCandless added a comment - Now that you have thrown down the gauntlet, I do want to rise to the challenge, but I don't know what it is! +1 this is hilarious Nomatter what the gauntlet is, let's all rise to the occassion! Final location of code is not a good or valid reason to block commits IMO. That can be improved over time, when people back up there fricken wants with some code rather than road blocks. This is a political revert request - though I'm sure you will now try and twist it to some technical BS. I generally agree with this, ie, for healthy Apache projects, where code can be freely refactored over time, it doesn't matter much where the initial commit goes. Progress not perfection... But it has become evident, recently, that efforts to refactor Lucene/Solr sources to their natural places are in fact strongly resisted (eg, Yonik has now vetoed LUCENE-2995 , which looked like a great example of such a refactoring; LUCENE-2883 also met with resistance). Ie it now seems like we should try hard(er) to put new code in the right place, up front, because we are in fact not really free to later refactor it. So... here's an idea: how about if someone volunteered to take Yonik's latest patch here and make it a shared module, before committing? Yonik (and everyone else), would you be OK with that? (I'm not sure anyone would volunteer now, but, it seems like this would be a way out of this "impasse"...).
        Hide
        Mark Miller added a comment -

        Instead, I would encourage you to raise your objection with the Lucene PMC and call a vote. At least this way we have the discussion.

        That's okay - I'll just voice my opinion. Your interactions with other committers and conduct over time speaks for itself - I'm not going to code battle you - I'm just going to express my opinion.

        Show
        Mark Miller added a comment - Instead, I would encourage you to raise your objection with the Lucene PMC and call a vote. At least this way we have the discussion. That's okay - I'll just voice my opinion. Your interactions with other committers and conduct over time speaks for itself - I'm not going to code battle you - I'm just going to express my opinion.
        Hide
        Mark Miller added a comment -

        And so I will argue that your veto is invalid. I don't believe you have voiced a valid technical concern. You said that a couple committers think this could instead be integrated at the lucene level, and that some lucene users have voiced an interest in join. IMO, neither of this are either good reasons to revert this feature, nor valid technical reasons for a code veto.

        Show
        Mark Miller added a comment - And so I will argue that your veto is invalid. I don't believe you have voiced a valid technical concern. You said that a couple committers think this could instead be integrated at the lucene level, and that some lucene users have voiced an interest in join. IMO, neither of this are either good reasons to revert this feature, nor valid technical reasons for a code veto.
        Hide
        Robert Muir added a comment -

        ok, here is my formal veto:

        -1: The way in which we integrate features like this is important, because it defines the architecture of Lucene/Solr. On this issue two people voiced concerns that this feature is not being added at the correct "level", that it is a low-level search feature that belongs in the search engine library (Lucene). Lucene users have previously voiced desires for features like this, you can see some previous mailing list discussion by searching mail archives (http://www.lucidimagination.com/search/?q=join) and looking at the Lucene facet.

        If you think my veto is not valid, please be aware I'm not going to get into a shouting match with you about it. Instead, I would encourage you to raise your objection with the Lucene PMC and call a vote. At least this way we have the discussion.

        Show
        Robert Muir added a comment - ok, here is my formal veto: -1: The way in which we integrate features like this is important, because it defines the architecture of Lucene/Solr. On this issue two people voiced concerns that this feature is not being added at the correct "level", that it is a low-level search feature that belongs in the search engine library (Lucene). Lucene users have previously voiced desires for features like this, you can see some previous mailing list discussion by searching mail archives ( http://www.lucidimagination.com/search/?q=join ) and looking at the Lucene facet. If you think my veto is not valid, please be aware I'm not going to get into a shouting match with you about it. Instead, I would encourage you to raise your objection with the Lucene PMC and call a vote. At least this way we have the discussion.
        Hide
        Mark Miller added a comment -

        My reasons are technical, in that code is not committed to the right place.

        I think Apache thinking tends to lean against this as a technical veto - that's what I've gathered based on fallout discussion over the lucene-solr merge threads. But we can argue about that once you actually make a proper veto.

        Show
        Mark Miller added a comment - My reasons are technical, in that code is not committed to the right place. I think Apache thinking tends to lean against this as a technical veto - that's what I've gathered based on fallout discussion over the lucene-solr merge threads. But we can argue about that once you actually make a proper veto.
        Hide
        Mark Miller added a comment -

        You misunderstand. Because a revert is a big deal, what I'm saying is that you are supposed to be formal about it.

        For example:

        -1. Here is my technical reason.

        Just pointing to two non concrete comments about whether this should be a Lucene feature way up in the issue is not sufficient.

        Show
        Mark Miller added a comment - You misunderstand. Because a revert is a big deal, what I'm saying is that you are supposed to be formal about it. For example: -1. Here is my technical reason. Just pointing to two non concrete comments about whether this should be a Lucene feature way up in the issue is not sufficient.
        Hide
        Robert Muir added a comment -

        Hi Mark, my objections (Simon also voiced concerns) are raised earlier on this issue, if you scroll up. My reasons are technical, in that code is not committed to the right place.

        All we need to do here is discuss a solution that achieves consensus. I'll be more than happy to engage in more details, but at the moment I am trying to calm my anger from being accused of a 'political revert'.

        If you really think I have some ulterior/political motive, then voice it. This is a really big deal.

        Show
        Robert Muir added a comment - Hi Mark, my objections (Simon also voiced concerns) are raised earlier on this issue, if you scroll up. My reasons are technical, in that code is not committed to the right place. All we need to do here is discuss a solution that achieves consensus. I'll be more than happy to engage in more details, but at the moment I am trying to calm my anger from being accused of a 'political revert'. If you really think I have some ulterior/political motive, then voice it. This is a really big deal.
        Hide
        Mark Miller added a comment -

        Hey Robert,

        A revert is historically considered a big deal. You should really officially -1 the commit and give your valid technical reason for doing so before being too over zealous about a quick revert.

        Show
        Mark Miller added a comment - Hey Robert, A revert is historically considered a big deal. You should really officially -1 the commit and give your valid technical reason for doing so before being too over zealous about a quick revert.
        Hide
        Robert Muir added a comment -

        I've reverted until there is consensus on this issue.

        Show
        Robert Muir added a comment - I've reverted until there is consensus on this issue.
        Hide
        Mark Miller added a comment -

        I suspected, and now think more strongly, that there is a language confusion issue between us and the last few comments. That is, I don't think we both think "this" and "it" are the same thing in our respective responses...

        I'm referring to the 'twist'. It sounds like you are talking about a fist fight ?

        I suppose this comes down to the fact that:

        are you really thinking this is gonna happen...

        is somewhat ambiguous. But seems as good a guess as any that you are talking about the 'twist' and not something that you want to see me do.

        Now that you have thrown down the gauntlet, I do want to rise to the challenge, but I don't know what it is!

        Show
        Mark Miller added a comment - I suspected, and now think more strongly, that there is a language confusion issue between us and the last few comments. That is, I don't think we both think "this" and "it" are the same thing in our respective responses... I'm referring to the 'twist'. It sounds like you are talking about a fist fight ? I suppose this comes down to the fact that: are you really thinking this is gonna happen... is somewhat ambiguous. But seems as good a guess as any that you are talking about the 'twist' and not something that you want to see me do. Now that you have thrown down the gauntlet, I do want to rise to the challenge, but I don't know what it is!
        Hide
        Simon Willnauer added a comment -

        oh, yes, I do.

        go ahead I wanna see it

        Show
        Simon Willnauer added a comment - oh, yes, I do. go ahead I wanna see it
        Hide
        Mark Miller added a comment -

        are you really thinking this is gonna happen...

        oh, yes, I do.

        Show
        Mark Miller added a comment - are you really thinking this is gonna happen... oh, yes, I do.
        Hide
        Simon Willnauer added a comment -

        Final location of code is not a good or valid reason to block commits IMO. That can be improved over time, when people back up there fricken wants with some code rather than road blocks. This is a political revert request - though I'm sure you will now try and twist it to some technical BS.

        are you really thinking this is gonna happen...

        Please revert.

        I agree, this is not how it works here IMO so it has nothing todo with politics... But honestly I am so sick of this discussion.

        Show
        Simon Willnauer added a comment - Final location of code is not a good or valid reason to block commits IMO. That can be improved over time, when people back up there fricken wants with some code rather than road blocks. This is a political revert request - though I'm sure you will now try and twist it to some technical BS. are you really thinking this is gonna happen... Please revert. I agree, this is not how it works here IMO so it has nothing todo with politics... But honestly I am so sick of this discussion.
        Hide
        Mark Miller added a comment -

        Great - that's the path you want to take on this?

        When Lucene/Solr become about blocking legitimate features rather than submitting patches along the direction you want, I'm out of here.

        Final location of code is not a good or valid reason to block commits IMO. That can be improved over time, when people back up there fricken wants with some code rather than road blocks. This is a political revert request - though I'm sure you will now try and twist it to some technical BS.

        Awesome - just awesome.

        Show
        Mark Miller added a comment - Great - that's the path you want to take on this? When Lucene/Solr become about blocking legitimate features rather than submitting patches along the direction you want, I'm out of here. Final location of code is not a good or valid reason to block commits IMO. That can be improved over time, when people back up there fricken wants with some code rather than road blocks. This is a political revert request - though I'm sure you will now try and twist it to some technical BS. Awesome - just awesome.
        Hide
        Robert Muir added a comment -

        So you just totally ignore 2 objections from other committers and commit anyway?

        Please revert.

        Show
        Robert Muir added a comment - So you just totally ignore 2 objections from other committers and commit anyway? Please revert.
        Hide
        Yonik Seeley added a comment -

        Committed. I'll leave the issue open until some docs get added to the wiki.

        Show
        Yonik Seeley added a comment - Committed. I'll leave the issue open until some docs get added to the wiki.
        Hide
        Yonik Seeley added a comment -

        Here's a new patch updated for trunk, that also adds the cross-core join.

        Example:

        {!join from=fromField to=toField fromIndex=fromCoreName}

        fromQuery

        I think this is ready to commit!

        Show
        Yonik Seeley added a comment - Here's a new patch updated for trunk, that also adds the cross-core join. Example: {!join from=fromField to=toField fromIndex=fromCoreName} fromQuery I think this is ready to commit!
        Hide
        Gerd Bremer added a comment -

        Is it possible to sort the join query result?

        // first class of documents with refid and pagecount fields;
        // a refid field maps to an id field in the second class of documents (1->100, 2->101)
        doc1:


        id:1
        refid:100
        pagecount:35

        doc2:


        id:2
        refid:101
        pagecount:45

        // second class of documents with text field
        doc100:
        ------
        id:100
        text:hello world

        doc101:
        ------
        id:101
        text: goodbye

        Now I would like to select the documents from the first class with field pagecount sorted descandant, that is

        {doc2, doc1}

        and return the mapped documents with text in the same order that is

        {doc101,doc100}

        . Is this possible with join? I'm looking for an alternative to partial update and this join looks promising if I can sort and get the mapped result in the same order.

        Show
        Gerd Bremer added a comment - Is it possible to sort the join query result? // first class of documents with refid and pagecount fields; // a refid field maps to an id field in the second class of documents (1->100, 2->101) doc1: id:1 refid:100 pagecount:35 doc2: id:2 refid:101 pagecount:45 // second class of documents with text field doc100: ------ id:100 text:hello world doc101: ------ id:101 text: goodbye Now I would like to select the documents from the first class with field pagecount sorted descandant, that is {doc2, doc1} and return the mapped documents with text in the same order that is {doc101,doc100} . Is this possible with join? I'm looking for an alternative to partial update and this join looks promising if I can sort and get the mapped result in the same order.
        Hide
        Briggs Thompson added a comment -

        As I write this I found https://issues.apache.org/jira/browse/SOLR-1131 so I guess you can have a field type that has multiple nested field types! That is pretty cool - I will have to play around with that.

        Show
        Briggs Thompson added a comment - As I write this I found https://issues.apache.org/jira/browse/SOLR-1131 so I guess you can have a field type that has multiple nested field types! That is pretty cool - I will have to play around with that.
        Hide
        Briggs Thompson added a comment -

        I was thinking more of the case where two indexes have completely different schema's; each with multiple fields that have a one to many relationship. For example, the below schema1 maybe have 100 schema2 documents associated to it.

        Schema1:
        documentId : int (unique key)
        field1
        field2
        field3 ...

        Schema2
        productId : int (unique key)
        documentId : int
        field1
        field2
        field3 ...

        I guess what would be necessary to do this within a single index schema is implement a custom class (solr.product), then have a multivalued field of a type with your custom class. Are there examples where something similar is implemented? I would also have to get rid of the unique key (or create a copy field or something along those lines)

        You mentioned sorting checked every document regardless if the document contains a value for the field. Is the same true for querying? I am worried that even if the above would work the performance would be impacted substantially considering you are turning an index with X documents to an index with 2X documents, plus the join (don't know what kind of performance impact that has).

        Thanks for your help Yonik!
        Briggs

        Show
        Briggs Thompson added a comment - I was thinking more of the case where two indexes have completely different schema's; each with multiple fields that have a one to many relationship. For example, the below schema1 maybe have 100 schema2 documents associated to it. Schema1: documentId : int (unique key) field1 field2 field3 ... Schema2 productId : int (unique key) documentId : int field1 field2 field3 ... I guess what would be necessary to do this within a single index schema is implement a custom class (solr.product), then have a multivalued field of a type with your custom class. Are there examples where something similar is implemented? I would also have to get rid of the unique key (or create a copy field or something along those lines) You mentioned sorting checked every document regardless if the document contains a value for the field. Is the same true for querying? I am worried that even if the above would work the performance would be impacted substantially considering you are turning an index with X documents to an index with 2X documents, plus the join (don't know what kind of performance impact that has). Thanks for your help Yonik! Briggs
        Hide
        Yonik Seeley added a comment -

        I don't think that would really work if it is a one to many relationship but thank you for your response!

        I don't see why not... the disadvantages of having everything in a single index are:

        • you can't use the same field name for different things, they must be the same type
        • efficiency and sparse fields - sorting on a field takes some memory for every document in the index, regardless of how many documents have that field

        Perhaps you could give a small example of how something could work in 2 indexes but not 1?

        Show
        Yonik Seeley added a comment - I don't think that would really work if it is a one to many relationship but thank you for your response! I don't see why not... the disadvantages of having everything in a single index are: you can't use the same field name for different things, they must be the same type efficiency and sparse fields - sorting on a field takes some memory for every document in the index, regardless of how many documents have that field Perhaps you could give a small example of how something could work in 2 indexes but not 1?
        Hide
        Briggs Thompson added a comment -

        "The workaround today would be to just add both document types to the same core (merge the schemas so one schema can support both document types)."

        I don't think that would really work if it is a one to many relationship but thank you for your response!

        Show
        Briggs Thompson added a comment - "The workaround today would be to just add both document types to the same core (merge the schemas so one schema can support both document types)." I don't think that would really work if it is a one to many relationship but thank you for your response!
        Hide
        Jonathan Rochkind added a comment -

        Even if the core's aren't in the same JVM, one could imagine a weird
        cross-host join where Solr actually connected to the external Solr,
        issued a query (perhaps using Solr binary format for efficiency, SolrJ
        style), got the list of values returned for a particular stored field,
        and used that as a filter on the current query.

        This is in fact similar to something that's come up (not sure if it's in
        a different ticket or just on listserv) about doing a similar thing with
        an external SQL query, where the result of some single-column SQL
        against an external database is used as a filter in the current query.
        Really the exact same problem, just a question of whether the external
        query is to SQL via JDBC or what have you, or instead to another core
        via SolrJ style connection. Either way do an external query, end up
        with a list of values, and want to use that as efficiently as possible
        (ie, NOT using lucene 'or' with hundreds or thousands of clauses!) as a
        filter on a particular solr indexed field for the current query.

        But clearly that enhancement would be a different ticket/patch – if the
        'join' patch as currently spec'd were to make it into Solr as is (same
        core join only) I'd be overjoyed, it would be awfully useful just as it
        is, so do not suggest that it's scope be increased thus raising the bar
        for the feature as is.

        Show
        Jonathan Rochkind added a comment - Even if the core's aren't in the same JVM, one could imagine a weird cross-host join where Solr actually connected to the external Solr, issued a query (perhaps using Solr binary format for efficiency, SolrJ style), got the list of values returned for a particular stored field, and used that as a filter on the current query. This is in fact similar to something that's come up (not sure if it's in a different ticket or just on listserv) about doing a similar thing with an external SQL query, where the result of some single-column SQL against an external database is used as a filter in the current query. Really the exact same problem, just a question of whether the external query is to SQL via JDBC or what have you, or instead to another core via SolrJ style connection. Either way do an external query, end up with a list of values, and want to use that as efficiently as possible (ie, NOT using lucene 'or' with hundreds or thousands of clauses!) as a filter on a particular solr indexed field for the current query. But clearly that enhancement would be a different ticket/patch – if the 'join' patch as currently spec'd were to make it into Solr as is (same core join only) I'd be overjoyed, it would be awfully useful just as it is, so do not suggest that it's scope be increased thus raising the bar for the feature as is.
        Hide
        Yonik Seeley added a comment -

        I am interested in joining on fields of a different schema (multiple core join).

        No, it's not currently possible, but it's an interesting idea, and seems doable provided the cores are in the same solr JVM.

        The workaround today would be to just add both document types to the same core (merge the schemas so one schema can support both document types).

        Show
        Yonik Seeley added a comment - I am interested in joining on fields of a different schema (multiple core join). No, it's not currently possible, but it's an interesting idea, and seems doable provided the cores are in the same solr JVM. The workaround today would be to just add both document types to the same core (merge the schemas so one schema can support both document types).
        Hide
        Briggs Thompson added a comment - - edited

        From the examples provided it doesn't look like this is possible but I just want to confirm with you guys. It looks like all of the examples are joins on fields within the same schema, but I am interested in joining on fields of a different schema (multiple core join).

        I haven't played around with the join yet, but would the following be possible?

        Core1
        docId: int
        body : text

        Core2
        id: int
        docId : int

        q=

        {!join from=Core1.docId to=Core2.docId}

        Core1.body:"super" AND Core2.ID:[1 TO 10]

        We have a similar use case to Tanguy Moal's example with one document type needs updating more often than the other. I know we could store the Core2 ids in an array of integers in Core1, but every time that ID mappings change we would have to re-index (with a potential of constant re-indexing of nearly the same data).

        Thanks for your help,
        Briggs

        Show
        Briggs Thompson added a comment - - edited From the examples provided it doesn't look like this is possible but I just want to confirm with you guys. It looks like all of the examples are joins on fields within the same schema, but I am interested in joining on fields of a different schema (multiple core join). I haven't played around with the join yet, but would the following be possible? Core1 docId: int body : text Core2 id: int docId : int q= {!join from=Core1.docId to=Core2.docId} Core1.body:"super" AND Core2.ID: [1 TO 10] We have a similar use case to Tanguy Moal's example with one document type needs updating more often than the other. I know we could store the Core2 ids in an array of integers in Core1, but every time that ID mappings change we would have to re-index (with a potential of constant re-indexing of nearly the same data). Thanks for your help, Briggs
        Hide
        Bojan Smid added a comment -

        Great, thx a lot Yonik .

        Show
        Bojan Smid added a comment - Great, thx a lot Yonik .
        Hide
        Yonik Seeley added a comment -

        However, it doesn't apply on current trunk any more.

        Here's a refresh.

        Show
        Yonik Seeley added a comment - However, it doesn't apply on current trunk any more. Here's a refresh.
        Hide
        Bojan Smid added a comment -

        Very nice patch Yonik. However, it doesn't apply on current trunk any more. Does anyone, by any chance, have a fresh version of this patch?

        Show
        Bojan Smid added a comment - Very nice patch Yonik. However, it doesn't apply on current trunk any more. Does anyone, by any chance, have a fresh version of this patch?
        Hide
        Tanguy Moal added a comment -

        Thanks Yonik, indeed I wasn't aware of that space for filtering (or scope or whatever )

        That did the trick, thank you very much. I'll continue my experiment, sounds very good!

        Show
        Tanguy Moal added a comment - Thanks Yonik, indeed I wasn't aware of that space for filtering (or scope or whatever ) That did the trick, thank you very much. I'll continue my experiment, sounds very good!
        Hide
        Yonik Seeley added a comment -

        Yonik, that feature is pretty interesting, but sounds more like a mapping or replacement than a join to me...

        Yeah, I agree.. it's closer to a semi-join http://en.wikipedia.org/wiki/Hash_join#Hash_semi-join
        I experimented with calling it pivot or map, but I think this type of semantic is realistically the closest we'll get to a join.

        Unfortunatelly, I thought I'd be able to perform search restrictions on the union produced by the join, but I wasn't able to do so

        This is mapping from one id space to another, so you need to filter in the appropriate space.
        You're doing a self-join here, so it's a more complex example to figure out.
        What about something like:
        q=

        {!join from=pivot to=pivot}

        price:[* TO 200]&fq=title:great
        ?

        Show
        Yonik Seeley added a comment - Yonik, that feature is pretty interesting, but sounds more like a mapping or replacement than a join to me... Yeah, I agree.. it's closer to a semi-join http://en.wikipedia.org/wiki/Hash_join#Hash_semi-join I experimented with calling it pivot or map, but I think this type of semantic is realistically the closest we'll get to a join. Unfortunatelly, I thought I'd be able to perform search restrictions on the union produced by the join, but I wasn't able to do so This is mapping from one id space to another, so you need to filter in the appropriate space. You're doing a self-join here, so it's a more complex example to figure out. What about something like: q= {!join from=pivot to=pivot} price: [* TO 200] &fq=title:great ?
        Hide
        Simon Willnauer added a comment -

        But we can contribute this to the right place. Things like queries and queryparsers don't need to be added to Solr-only.
        Its easier to do this up-front, than after the fact. I felt that it was pulling teeth with the analyzers, etc.

        I 100% agree with robert, I think we should not make the same mistakes we did with faceting (at that time solr & lucene where two projects I know - so don't get me wrong!!) stuff like that should be available for all users solr is just the sugar on top and there are many usecases where folks use something in between.

        simon

        Show
        Simon Willnauer added a comment - But we can contribute this to the right place. Things like queries and queryparsers don't need to be added to Solr-only. Its easier to do this up-front, than after the fact. I felt that it was pulling teeth with the analyzers, etc. I 100% agree with robert, I think we should not make the same mistakes we did with faceting (at that time solr & lucene where two projects I know - so don't get me wrong!!) stuff like that should be available for all users solr is just the sugar on top and there are many usecases where folks use something in between. simon
        Hide
        Tanguy Moal added a comment -

        Yonik, that feature is pretty interesting, but sounds more like a mapping or replacement than a join to me...

        I have a use case were my documents have some fields getting updated frequently, and others don't. By the way those fields which aren't updated frequently, occur to be common from times to times with other documents. When I saw that issue, I decided to give it a try. I decided to split my old documents in new documents, using two documents to represent one old document, plus a common "pivot" field.

        At first a simple substitution occured, because of children not having the same "from" field value, I worked around that easily. I now have twice more documents, as expected. Unfortunatelly, I thought I'd be able to perform search restrictions on the union produced by the join, but I wasn't able to do so... Did I miss something somewhere ?

        Little example :

        fields :

        • title : text
        • price : sint
        • id : string
        • pivot : string

        I push : doc 1 : "id", "1", "price", "150", "pivot", "pivot1" and doc 2 : "id", "2", "title", "great title", "pivot", "pivot1"

        I search :

        • q= {!join from=pivot to=pivot}title:great => got 2 docs, perfect
          * q={!join from=pivot to=pivot}

          price:[* TO 200] => got 2 docs, so far so good

        • q= {!join from=pivot to=pivot}

          price:[* TO 200]+AND+title:great => no result found. Of course that makes sense, no such a document match this conjunction, but I thought that by "joining" I'd be able to do so...

        What's your point of view ?

        Thanks in advance

        Show
        Tanguy Moal added a comment - Yonik, that feature is pretty interesting, but sounds more like a mapping or replacement than a join to me... I have a use case were my documents have some fields getting updated frequently, and others don't. By the way those fields which aren't updated frequently, occur to be common from times to times with other documents. When I saw that issue, I decided to give it a try. I decided to split my old documents in new documents, using two documents to represent one old document, plus a common "pivot" field. At first a simple substitution occured, because of children not having the same "from" field value, I worked around that easily. I now have twice more documents, as expected. Unfortunatelly, I thought I'd be able to perform search restrictions on the union produced by the join, but I wasn't able to do so... Did I miss something somewhere ? Little example : fields : title : text price : sint id : string pivot : string I push : doc 1 : "id", "1", "price", "150", "pivot", "pivot1" and doc 2 : "id", "2", "title", "great title", "pivot", "pivot1" I search : q= {!join from=pivot to=pivot}title:great => got 2 docs, perfect * q={!join from=pivot to=pivot} price: [* TO 200] => got 2 docs, so far so good q= {!join from=pivot to=pivot} price: [* TO 200] +AND+title:great => no result found. Of course that makes sense, no such a document match this conjunction, but I thought that by "joining" I'd be able to do so... What's your point of view ? Thanks in advance
        Hide
        Robert Muir added a comment -

        This was done for a customer, and it's being contributed back now.

        But we can contribute this to the right place. Things like queries and queryparsers don't need to be added to Solr-only.
        Its easier to do this up-front, than after the fact. I felt that it was pulling teeth with the analyzers, etc.

        Show
        Robert Muir added a comment - This was done for a customer, and it's being contributed back now. But we can contribute this to the right place. Things like queries and queryparsers don't need to be added to Solr-only. Its easier to do this up-front, than after the fact. I felt that it was pulling teeth with the analyzers, etc.
        Hide
        Yonik Seeley added a comment -

        This was done for a customer, and it's being contributed back now.

        There was already some discussion on parent-child relationships in lucene

        Yeah, I think that work should continue! It would be great to get some index-level support for relationships.

        Show
        Yonik Seeley added a comment - This was done for a customer, and it's being contributed back now. There was already some discussion on parent-child relationships in lucene Yeah, I think that work should continue! It would be great to get some index-level support for relationships.
        Hide
        Robert Muir added a comment -

        Yonik, can we expose to lucene users too?

        There was already some discussion on parent-child relationships in lucene, and I don't see any reason stuff like this should be added to Solr-only.

        Show
        Robert Muir added a comment - Yonik, can we expose to lucene users too? There was already some discussion on parent-child relationships in lucene, and I don't see any reason stuff like this should be added to Solr-only.
        Hide
        Yonik Seeley added a comment -

        Here's a patch with tests that implements the first algorithm for joining (just as in faceting, there will be multiple going forward).

        This implements a many-to-many join - documents are mapped based on matching terms in index fields. It's all on the fly, so nothing to declare up front. You can look at the tests for examples on how to use this.

        Another example: to find the parents of all blue-eyed children, simply do
        fq=

        {!join from=parent to=name}

        eyes:blue
        Or, you can join in the reverse direction to find the children of all blue eyed parents
        fq=

        {!join from=name to=parent}

        eyes:blue
        Or you can even do a self-join to find everyone with the same eye color as johnny, without knowing what that is
        fq=

        {!join from=eyes to=eyes}

        name:johnny

        This current algorithm is like facet.method=enum - it's more efficient when there are fewer terms in the fields being joined.

        Show
        Yonik Seeley added a comment - Here's a patch with tests that implements the first algorithm for joining (just as in faceting, there will be multiple going forward). This implements a many-to-many join - documents are mapped based on matching terms in index fields. It's all on the fly, so nothing to declare up front. You can look at the tests for examples on how to use this. Another example: to find the parents of all blue-eyed children, simply do fq= {!join from=parent to=name} eyes:blue Or, you can join in the reverse direction to find the children of all blue eyed parents fq= {!join from=name to=parent} eyes:blue Or you can even do a self-join to find everyone with the same eye color as johnny, without knowing what that is fq= {!join from=eyes to=eyes} name:johnny This current algorithm is like facet.method=enum - it's more efficient when there are fewer terms in the fields being joined.

          People

          • Assignee:
            Unassigned
            Reporter:
            Yonik Seeley
          • Votes:
            5 Vote for this issue
            Watchers:
            27 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development