Solr
  1. Solr
  2. SOLR-1516

DocumentList and Document QueryResponseWriter

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 1.3
    • Fix Version/s: 1.5, 3.1, 4.0-ALPHA
    • Component/s: search
    • Labels:
      None
    • Environment:

      My MacBook Pro laptop.

      Description

      I tried to implement a custom QueryResponseWriter the other day and was amazed at the level of unmarshalling and weeding through objects that was necessary just to format the output o.a.l.Document list. As a user, I wanted to be able to implement either 2 functions:

      • process a document at a time, and format it (for speed/efficiency)
      • process all the documents at once, and format them (in case an aggregate calculation is necessary for outputting)

      So, I've decided to contribute 2 simple classes that I think are sufficiently generic and reusable. The first is o.a.s.request.DocumentResponseWriter – it handles the first bullet above. The second is o.a.s.request.DocumentListResponseWriter. Both are abstract base classes and require the user to implement either an #emitDoc function (in the case of bullet 1), or an #emitDocList function (in the case of bullet 2). Both classes provide an #emitHeader and #emitFooter function set that handles formatting and output before the Document list is processed.

      1. SOLR-1516.Mattmann.101809.patch.txt
        10 kB
        Chris A. Mattmann
      2. SOLR-1516.Mattmann.112409.patch.txt
        20 kB
        Chris A. Mattmann
      3. SOLR-1516.patch
        3 kB
        Noble Paul
      4. SOLR-1516.patch
        9 kB
        Noble Paul
      5. SOLR-1516.patch
        6 kB
        Noble Paul

        Activity

        Hide
        Shalin Shekhar Mangar added a comment -

        Moving to 1.5. We are not accepting new features for 1.4 anymore.

        Show
        Shalin Shekhar Mangar added a comment - Moving to 1.5. We are not accepting new features for 1.4 anymore.
        Hide
        Chris A. Mattmann added a comment -

        Hi All:

        I don't mean to be a pest here, but I've seen the amount of activity going on the SOLR lists recently, as well as the decision to hold off on calling for a vote on 1.4 until Lucene 2.9.1 is released. This patch is self-contained, doesn't touch any code, and honestly, it only adds functionality that would have made my life as a user of SOLR a lot easier (I would have saved the hour of debugging and printing out #getClass on the Objects in NamedList, and on top of that only had to implement an #emitDoc or #emitDocList function and optionally #emitHeader and #emitFooter, rather than the rest of the supporting code).

        Am I the only one that's run into a problem trying to write a custom XML SOLR output that's inherently simple? That is, XML output that doesn't need to worry about the inherent types of the named values in the NamedList, output that only cares about spitting out the set of returned Documents?

        It would be great to see this get into 1.4, but if I'm the outlier, I can wait. Just thought I'd raise the issue.

        Cheers,
        Chris

        Show
        Chris A. Mattmann added a comment - Hi All: I don't mean to be a pest here, but I've seen the amount of activity going on the SOLR lists recently, as well as the decision to hold off on calling for a vote on 1.4 until Lucene 2.9.1 is released. This patch is self-contained, doesn't touch any code, and honestly, it only adds functionality that would have made my life as a user of SOLR a lot easier (I would have saved the hour of debugging and printing out #getClass on the Objects in NamedList, and on top of that only had to implement an #emitDoc or #emitDocList function and optionally #emitHeader and #emitFooter, rather than the rest of the supporting code). Am I the only one that's run into a problem trying to write a custom XML SOLR output that's inherently simple? That is, XML output that doesn't need to worry about the inherent types of the named values in the NamedList, output that only cares about spitting out the set of returned Documents? It would be great to see this get into 1.4, but if I'm the outlier, I can wait. Just thought I'd raise the issue. Cheers, Chris
        Hide
        Chris A. Mattmann added a comment - - edited

        I haven't really heard any comments on this issue, and I've got the impression that not many folks write these QueryResponseWriters. To me, writing one was invaluable. The use case was:

        • I make the choice to make SOLR the gold source for search index data (I'm dealing with planetary science and earth science data on 4-5 projects)
        • I want to drive search but also met output from SOLR (treating SOLR as a search web service, with customizable output [2])
        • the default SOLR XML and the 5-7 output formats didn't do it for me since I have some specialized earth and planetary science use cases. E.g., on a few different projects, I need to be able to:
        • output FGDC XML (yes it's a standard for earth science metadata, and also relevant for the GeoSOLR stuff)
        • output custom RDF metadata
        • output a particular style of JSON to plug in to some external web client, e.g., an auto-suggest that requires its own JSON format, not SOLR's

        To illustrate the reason that the 5-7 output formats didn't do it for me either, I'll use an example. There may be the sense of, "well why didn't I write some Java/Ruby/PHP/Python client that called SOLR and one of it's existing wt's and then output a custom format from your favorite programming language (PL)"? The reasons are three fold:

        1. SOLR advertises that the QueryResponseWriter interface is an official SOLR plugin and interface, at least according to:

        • the Wiki documentation [1]
        • the advertised published book on SOLR [2]
        • Chris Hostetter's ApacheCon08 slides as part of the core SOLR architecture in his 50K foot view diagram [3]

        2. If SOLR is truly a search web service, and allows for changeable output formats (evidenced by exposing the wt parameter), then why force people to use one of the existing wt's and then ask them to transform (either via a PL, or via XSLT) instead of allowing them to natively generate the specific output format type?

        3. Why make o.a.l.r.QueryResponseWriter an interface and not a concrete class if it is never intended to be implemented by others, or more importantly, is kind of non-intuitive to implement?

        Besides 1-3 for me, I have external COTS and OTS tools that cannot be changed and that expect data to be loaded into them in a particular format, and I'd like to plug them into SOLR and the easiest way for me to do that is with a curl/wget type operation and then a pipe into the COTS/OTS tool, and wt's are the way to go for that.

        So, given the above, when I went to write a "wt" I was surprised how hard it was for me to understand the NamedList structure which is just a bag of objects that you have to unpack with unfriendly instanceof checks and recursive unmarshalling (walking the NamedList tree). All I wanted for my wt was to be able to format the output Document List or on a Doc-by-doc basis.

        Anyways just wanted to provide some further fodder and discussion for this issue. To me this is important, and clearly, based on [1-3], QueryResponseWriters by definition seem to be a big piece of the SOLR architecture.

        Chris


        [1] http://wiki.apache.org/solr/QueryResponseWriter
        [2] http://people.apache.org/~hossman/apachecon2008us/btb/apache-solr-beyond-the-box.pdf
        [3] SOLR 1.4 Enterprise Search Server, Packt Publishing, 2009.

        Show
        Chris A. Mattmann added a comment - - edited I haven't really heard any comments on this issue, and I've got the impression that not many folks write these QueryResponseWriters. To me, writing one was invaluable. The use case was: I make the choice to make SOLR the gold source for search index data (I'm dealing with planetary science and earth science data on 4-5 projects) I want to drive search but also met output from SOLR (treating SOLR as a search web service, with customizable output [2] ) the default SOLR XML and the 5-7 output formats didn't do it for me since I have some specialized earth and planetary science use cases. E.g., on a few different projects, I need to be able to: output FGDC XML (yes it's a standard for earth science metadata, and also relevant for the GeoSOLR stuff) output custom RDF metadata output a particular style of JSON to plug in to some external web client, e.g., an auto-suggest that requires its own JSON format, not SOLR's To illustrate the reason that the 5-7 output formats didn't do it for me either, I'll use an example. There may be the sense of, "well why didn't I write some Java/Ruby/PHP/Python client that called SOLR and one of it's existing wt's and then output a custom format from your favorite programming language (PL)"? The reasons are three fold: 1. SOLR advertises that the QueryResponseWriter interface is an official SOLR plugin and interface, at least according to: the Wiki documentation [1] the advertised published book on SOLR [2] Chris Hostetter's ApacheCon08 slides as part of the core SOLR architecture in his 50K foot view diagram [3] 2. If SOLR is truly a search web service, and allows for changeable output formats (evidenced by exposing the wt parameter), then why force people to use one of the existing wt's and then ask them to transform (either via a PL, or via XSLT) instead of allowing them to natively generate the specific output format type? 3. Why make o.a.l.r.QueryResponseWriter an interface and not a concrete class if it is never intended to be implemented by others, or more importantly, is kind of non-intuitive to implement? Besides 1-3 for me, I have external COTS and OTS tools that cannot be changed and that expect data to be loaded into them in a particular format, and I'd like to plug them into SOLR and the easiest way for me to do that is with a curl/wget type operation and then a pipe into the COTS/OTS tool, and wt's are the way to go for that. So, given the above, when I went to write a "wt" I was surprised how hard it was for me to understand the NamedList structure which is just a bag of objects that you have to unpack with unfriendly instanceof checks and recursive unmarshalling (walking the NamedList tree). All I wanted for my wt was to be able to format the output Document List or on a Doc-by-doc basis. Anyways just wanted to provide some further fodder and discussion for this issue. To me this is important, and clearly, based on [1-3] , QueryResponseWriters by definition seem to be a big piece of the SOLR architecture. Chris — [1] http://wiki.apache.org/solr/QueryResponseWriter [2] http://people.apache.org/~hossman/apachecon2008us/btb/apache-solr-beyond-the-box.pdf [3] SOLR 1.4 Enterprise Search Server, Packt Publishing, 2009.
        Hide
        Shalin Shekhar Mangar added a comment -

        Chris, it is not about whether we want to encourage people to write custom QueryResponseWriters or not. We had an agreement on where to draw the line for accepting new issues. Had this issue been opened a few weeks earlier we would have pushed it into 1.4

        We can work on this as soon as 1.4 is out. You can either use your internal patched version or 1.5 trunk once this patch is committed.

        Show
        Shalin Shekhar Mangar added a comment - Chris, it is not about whether we want to encourage people to write custom QueryResponseWriters or not. We had an agreement on where to draw the line for accepting new issues. Had this issue been opened a few weeks earlier we would have pushed it into 1.4 We can work on this as soon as 1.4 is out. You can either use your internal patched version or 1.5 trunk once this patch is committed.
        Hide
        Chris A. Mattmann added a comment -

        Shalin, I guess what I'm trying to raise is a larger issue. You mention "we" had an agreement on where to draw the line for "new issues". Well, what's a "new issue"? And, who's "we"? What's SOLR's process for new features versus bug fixes, etc.? I don't see it documented anywhere but I think it seems pretty clear to you what the process is. Admittedly I've only been actively monitoring the SOLR list for a few weeks now, so perhaps it was discussed on the mailing lists before at some point. Is this something that can be put up (or is already) on the Wiki? Or perhaps you can point me to it?

        I know in 2005 for Nutch, we had a release vote called on 0.7 which was a large release and a big time production like Solr 1.4. I also know that, at the last minute, someone mentioned that there was an RSSParser out there in patch land, that only added functionality, didn't take anything away, didn't touch core modules, and was sufficiently commit worthy (javadocs, unit tests [if applicable], etc.) Nutch was very much in the same type of release cycle back in that day that Solr was: huge release, many months in between releases, etc., so it was just as sacrilegious back then for these folks to ask whether the RSSParser could be included in Nutch at the last minute before a release vote, and after a discussed "freeze" on features. However, the Nutch committers took a step back, discussed the pros/cons (briefly, nothing major) over a few emails and decided why not? At the time, Nutch was producing a release pretty infrequently, so getting something into a release meant it was going to be out there for more people to use, touch, test, and leverage, which was a good thing since the next release wouldn't be coming for a while. And, I think we can all agree, releases are different than trunk, or patched versions of software. In many organizations, folks just aren't comfortable with saying, "depend on the latest trunk" or "depend on release X + patch Y". Releases sound more official to management.

        I'm fine with this patch not getting into 1.4 and I'll drop it since no one else really seems to be commenting on this and I was just trying to see if there was interest, which I hope is something that's allowed and encouraged over here in SOLR land. Everyone wants to feel like there is opportunity to be included and to ask questions and everyone wants a chance to have their patch included in a big release that will be widely disseminated.Beyond that, thinking through this issue made me ask myself yesterday (and with my earlier comment), well, "how are the SOLR users writing these response writers?", or, "are they even doing it that much?" So, I was trying to use this issue as a placeholder to obtain feedback on that.

        Chris

        Show
        Chris A. Mattmann added a comment - Shalin, I guess what I'm trying to raise is a larger issue. You mention "we" had an agreement on where to draw the line for "new issues". Well, what's a "new issue"? And, who's "we"? What's SOLR's process for new features versus bug fixes, etc.? I don't see it documented anywhere but I think it seems pretty clear to you what the process is. Admittedly I've only been actively monitoring the SOLR list for a few weeks now, so perhaps it was discussed on the mailing lists before at some point. Is this something that can be put up (or is already) on the Wiki? Or perhaps you can point me to it? I know in 2005 for Nutch, we had a release vote called on 0.7 which was a large release and a big time production like Solr 1.4. I also know that, at the last minute, someone mentioned that there was an RSSParser out there in patch land, that only added functionality, didn't take anything away, didn't touch core modules, and was sufficiently commit worthy (javadocs, unit tests [if applicable] , etc.) Nutch was very much in the same type of release cycle back in that day that Solr was: huge release, many months in between releases, etc., so it was just as sacrilegious back then for these folks to ask whether the RSSParser could be included in Nutch at the last minute before a release vote, and after a discussed "freeze" on features. However, the Nutch committers took a step back, discussed the pros/cons (briefly, nothing major) over a few emails and decided why not? At the time, Nutch was producing a release pretty infrequently, so getting something into a release meant it was going to be out there for more people to use, touch, test, and leverage, which was a good thing since the next release wouldn't be coming for a while. And, I think we can all agree, releases are different than trunk, or patched versions of software. In many organizations, folks just aren't comfortable with saying, "depend on the latest trunk" or "depend on release X + patch Y". Releases sound more official to management. I'm fine with this patch not getting into 1.4 and I'll drop it since no one else really seems to be commenting on this and I was just trying to see if there was interest, which I hope is something that's allowed and encouraged over here in SOLR land. Everyone wants to feel like there is opportunity to be included and to ask questions and everyone wants a chance to have their patch included in a big release that will be widely disseminated.Beyond that, thinking through this issue made me ask myself yesterday (and with my earlier comment), well, "how are the SOLR users writing these response writers?", or, "are they even doing it that much?" So, I was trying to use this issue as a placeholder to obtain feedback on that. Chris
        Hide
        Noble Paul added a comment -

        Well, what's a "new issue"? And, who's "we"? What's SOLR's process for new features versus bug fixes, etc.?

        Solr has been in the release mode for over a month. There are so many loose ends which the we (committers ) are trying to tie up before the actual release happens.

        We always welcome new features. But , adding a feature means we are going to keep it for long. So we need to ensure that we add the right feature in the right way because Solr cannot break back-compat . So, somebody is going to review it and figure out all these. There are only so many hands and they have only so much spare time to do all these .That is the reason why the last minute feature requests are discouraged .

        Show
        Noble Paul added a comment - Well, what's a "new issue"? And, who's "we"? What's SOLR's process for new features versus bug fixes, etc.? Solr has been in the release mode for over a month. There are so many loose ends which the we (committers ) are trying to tie up before the actual release happens. We always welcome new features. But , adding a feature means we are going to keep it for long. So we need to ensure that we add the right feature in the right way because Solr cannot break back-compat . So, somebody is going to review it and figure out all these. There are only so many hands and they have only so much spare time to do all these .That is the reason why the last minute feature requests are discouraged .
        Hide
        Noble Paul added a comment -

        I am looking at the patch and find it difficult to see the value this brings.

        I am in general interested in making it easier to make it easier to write a QueryResponseWriter . Let us think about how we can enhance this to have a GenericResponseWriter which can be the base for all writers

        Show
        Noble Paul added a comment - I am looking at the patch and find it difficult to see the value this brings. I am in general interested in making it easier to make it easier to write a QueryResponseWriter . Let us think about how we can enhance this to have a GenericResponseWriter which can be the base for all writers
        Hide
        Chris A. Mattmann added a comment -

        I am looking at the patch and find it difficult to see the value this brings.

        Can you elaborate on why you don't think this patch adds value? I've elaborated above on the value I think that this brings, and I would appreciate the same level of detail regarding the contrary.

        I'm +1 for actually refactoring and rewriting the QueryResponseWriter framework, making all QueryResponseWriters use an abstract base class, and functions/methods around this, simplifying the code. Furthermore, I'm also for getting rid of the abstract data structure (bag of objects) class called NamedValueList and replacing it with a more concrete type hierarchy, but I think that's another patch (or series of patches) IMHO. Furthermore, there is one succinct difference between this patch and any GenericResponseWriter that you are proposing, namely: the existing SOLR QueryResponseWriters are all means of hijacking the entire SOLR QueryResponse and writing it out in a particular format. On the other hand, my patch is focused on allowing the user to hijack simply the returned DocumentList (full set), or hijack the response on a Doc-by-Doc basis.

        Show
        Chris A. Mattmann added a comment - I am looking at the patch and find it difficult to see the value this brings. Can you elaborate on why you don't think this patch adds value? I've elaborated above on the value I think that this brings, and I would appreciate the same level of detail regarding the contrary. I'm +1 for actually refactoring and rewriting the QueryResponseWriter framework, making all QueryResponseWriters use an abstract base class, and functions/methods around this, simplifying the code. Furthermore, I'm also for getting rid of the abstract data structure (bag of objects) class called NamedValueList and replacing it with a more concrete type hierarchy, but I think that's another patch (or series of patches) IMHO. Furthermore, there is one succinct difference between this patch and any GenericResponseWriter that you are proposing, namely: the existing SOLR QueryResponseWriters are all means of hijacking the entire SOLR QueryResponse and writing it out in a particular format. On the other hand, my patch is focused on allowing the user to hijack simply the returned DocumentList (full set), or hijack the response on a Doc-by-Doc basis.
        Hide
        Noble Paul added a comment -

        Can you elaborate on why you don't think this patch adds value? I've elaborated above on the value I think that this brings, and I would appreciate the same level of detail regarding the contrary.

        This does not help the user of the API much because the real difficulty is in unmarshalling various types of objects. This patch does nothing to read the stored fields from the Document .

        I'm also for getting rid of the abstract data structure (bag of objects) class called NamedValueList and replacing it with a more concrete type hierarchy....

        That is really difficult. A lot of components write their output in a very arbitrary Object tree. The output is largely designed like a JSON object tree (with more promitives) . The producer decides what the tree contains. The good thing about this approach is that we don't need to build custom classes for every type of output.

        On the other hand, my patch is focused on allowing the user to hijack simply the returned DocumentList

        There is no reason why a GenericResponseWriter can't do that . I am not happy about putting this classes in and leading users to believe that this is all that they have to do.

        Show
        Noble Paul added a comment - Can you elaborate on why you don't think this patch adds value? I've elaborated above on the value I think that this brings, and I would appreciate the same level of detail regarding the contrary. This does not help the user of the API much because the real difficulty is in unmarshalling various types of objects. This patch does nothing to read the stored fields from the Document . I'm also for getting rid of the abstract data structure (bag of objects) class called NamedValueList and replacing it with a more concrete type hierarchy.... That is really difficult. A lot of components write their output in a very arbitrary Object tree. The output is largely designed like a JSON object tree (with more promitives) . The producer decides what the tree contains. The good thing about this approach is that we don't need to build custom classes for every type of output. On the other hand, my patch is focused on allowing the user to hijack simply the returned DocumentList There is no reason why a GenericResponseWriter can't do that . I am not happy about putting this classes in and leading users to believe that this is all that they have to do.
        Hide
        Chris A. Mattmann added a comment - - edited

        This does not help the user of the API much because the real difficulty is in unmarshalling various types of objects. This patch does nothing to read the stored fields from the Document .

        I agree with your statement above regarding "the real difficulty". That's precisely what this patch addresses. This patch deals with that real difficulty for users (of which there are plenty, please see my comment above RE: use cases, e.g., FGDC, RDF, etc.) that are mostly concerned with spitting out (for format compatibility) the resultant Documents from searches in a particular XML format. This patch isn't intended to do anything with the stored fields – that's left up to the user who extends the abstract base classes by implementing #emitDoc or #emitDocList, where the user deals with Lucene Documents. As I stated above numerous times, it took me quite a bit of printing out and deducing the structure of the resultant SolrResponse to determine where in that list Documents were stored (and in fact they weren't it i just the IDs). This isn't really documented anywhere per se (at least from what I could find with the online Javadocs or Wiki).

        That is really difficult. A lot of components write their output in a very arbitrary Object tree. The output is largely designed like a JSON object tree (with more promitives) . The producer decides what the tree contains. The good thing about this approach is that we don't need to build custom classes for every type of output.

        Why is this difficult? It would amount to components declaring what type of schema they return. Typed, bags of objects, coupled with sparse documentation isn't exactly the answer. I think we both agree that there is a larger issue to look at in terms of the SolrResponse though and QueryResponseWriters, my point is that I don't think using this issue to solve those bigger picture questions is the right answer. I'd be happy to create further issues to discuss this.

        There is no reason why a GenericResponseWriter can't do that . I am not happy about putting this classes in and leading users to believe that this is all that they have to do.

        How are we telling users that this is all they have to do? The patch specifically states (taken from the included Javadoc):

        This {@link QueryResponseWriter} allows a user to implement the {@link #emitDoc(Document, Writer)} function which acts as a callback function to process one Lucene {@link Document} returned from the SOLR Query at a time. Sub-classes should keep track of any global state as this class does not provide a means to access the entire set of returned {@link Document}s.If that functionality is required, see {@link DocumentListResponseWriter}.

        This {@link QueryResponseWriter} allows a user to implement the {@link #emitDocList(List, Writer)} function which acts as a callback function to process the entire {@link List} of Lucene {@link Document} returned from the SOLR Query at once. To process the {@link Document}s one-at-a-time (to conserve resources, or to speed up the processing/etc.), see {@link DocumentResponseWriter}.

        I'm not sure I see the concern behind this ~250 line patch? The patch:

        • adds functionality that would have simplified a number of use cases that I am leveraging SOLR for in the space and earth science data community, where formats are critical and metadata output is more important than the specific search meta-info (# hits, query time, start/end, etc.). See the 3-4 examples I stated above.
        • does not introduce anything that is not backwards compatible
        • includes javadoc on all public methods, as well as class-level javadoc
        • should apply without trouble to the current SVN trunk

        This has typically been the criteria for inclusion (modulo unit tests, which if there is concern there, I'd be happy to include) – is the criteria different here in SOLR?

        Show
        Chris A. Mattmann added a comment - - edited This does not help the user of the API much because the real difficulty is in unmarshalling various types of objects. This patch does nothing to read the stored fields from the Document . I agree with your statement above regarding "the real difficulty". That's precisely what this patch addresses. This patch deals with that real difficulty for users (of which there are plenty, please see my comment above RE: use cases, e.g., FGDC, RDF, etc.) that are mostly concerned with spitting out (for format compatibility) the resultant Documents from searches in a particular XML format. This patch isn't intended to do anything with the stored fields – that's left up to the user who extends the abstract base classes by implementing #emitDoc or #emitDocList, where the user deals with Lucene Documents. As I stated above numerous times, it took me quite a bit of printing out and deducing the structure of the resultant SolrResponse to determine where in that list Documents were stored (and in fact they weren't it i just the IDs). This isn't really documented anywhere per se (at least from what I could find with the online Javadocs or Wiki). That is really difficult. A lot of components write their output in a very arbitrary Object tree. The output is largely designed like a JSON object tree (with more promitives) . The producer decides what the tree contains. The good thing about this approach is that we don't need to build custom classes for every type of output. Why is this difficult? It would amount to components declaring what type of schema they return. Typed, bags of objects, coupled with sparse documentation isn't exactly the answer. I think we both agree that there is a larger issue to look at in terms of the SolrResponse though and QueryResponseWriters, my point is that I don't think using this issue to solve those bigger picture questions is the right answer. I'd be happy to create further issues to discuss this. There is no reason why a GenericResponseWriter can't do that . I am not happy about putting this classes in and leading users to believe that this is all that they have to do. How are we telling users that this is all they have to do? The patch specifically states (taken from the included Javadoc): This {@link QueryResponseWriter} allows a user to implement the {@link #emitDoc(Document, Writer)} function which acts as a callback function to process one Lucene {@link Document} returned from the SOLR Query at a time. Sub-classes should keep track of any global state as this class does not provide a means to access the entire set of returned {@link Document}s.If that functionality is required, see {@link DocumentListResponseWriter}. This {@link QueryResponseWriter} allows a user to implement the {@link #emitDocList(List, Writer)} function which acts as a callback function to process the entire {@link List} of Lucene {@link Document} returned from the SOLR Query at once. To process the {@link Document}s one-at-a-time (to conserve resources, or to speed up the processing/etc.), see {@link DocumentResponseWriter}. I'm not sure I see the concern behind this ~250 line patch? The patch: adds functionality that would have simplified a number of use cases that I am leveraging SOLR for in the space and earth science data community, where formats are critical and metadata output is more important than the specific search meta-info (# hits, query time, start/end, etc.). See the 3-4 examples I stated above. does not introduce anything that is not backwards compatible includes javadoc on all public methods, as well as class-level javadoc should apply without trouble to the current SVN trunk This has typically been the criteria for inclusion (modulo unit tests, which if there is concern there, I'd be happy to include) – is the criteria different here in SOLR?
        Hide
        Noble Paul added a comment -

        where formats are critical and metadata output is more important than the specific search meta-info

        hi Cris, both of us are in agreement over the intent. We need to make it easier to write out data in any format. I am thinking that it can be made better and more complete. I hope to submit a patch soon

        Show
        Noble Paul added a comment - where formats are critical and metadata output is more important than the specific search meta-info hi Cris, both of us are in agreement over the intent. We need to make it easier to write out data in any format. I am thinking that it can be made better and more complete. I hope to submit a patch soon
        Hide
        David Woollard added a comment -

        As an outside observer to this list, i think this issue has gotten a little testy and I am hoping not indicative of SOLR as a project. I have use cases similar to Chris' and I share Noble's sentiment that the purpose of this discussion should be to make it easier to write data out in any format the user needs. Noble, if you are working on a patch can you discuss alternatives to what Chris has suggested?

        Show
        David Woollard added a comment - As an outside observer to this list, i think this issue has gotten a little testy and I am hoping not indicative of SOLR as a project. I have use cases similar to Chris' and I share Noble's sentiment that the purpose of this discussion should be to make it easier to write data out in any format the user needs. Noble, if you are working on a patch can you discuss alternatives to what Chris has suggested?
        Hide
        Noble Paul added a comment -

        Most of the custom writers are less bothered about sections other than the DocList. The hard part is in reading the stored fields from lucene Documents depending on what fields are requested by the user. If the API allows to fetch the data as an Iterator<SolrDocument> w/o bothering about the low level Lucene details that would be ideal.

        Show
        Noble Paul added a comment - Most of the custom writers are less bothered about sections other than the DocList. The hard part is in reading the stored fields from lucene Documents depending on what fields are requested by the user. If the API allows to fetch the data as an Iterator<SolrDocument> w/o bothering about the low level Lucene details that would be ideal.
        Hide
        Noble Paul added a comment -

        idea as a patch (untested)

        Show
        Noble Paul added a comment - idea as a patch (untested)
        Hide
        Noble Paul added a comment -

        This adds a GenericResponseWriter which can be used to implement different writers.

        Show
        Noble Paul added a comment - This adds a GenericResponseWriter which can be used to implement different writers.
        Hide
        Chris A. Mattmann added a comment - - edited

        Most of the custom writers are less bothered about sections other than the DocList. The hard part is in reading the stored fields from lucene Documents depending on what fields are requested by the user. If the API allows to fetch the data as an Iterator<SolrDocument> w/o bothering about the low level Lucene details that would be ideal.

        This is exactly my point with this issue. I think that you and I are on the same page, Noble. I took a look at the patch you uploaded:

          public abstract void start(Writer writer);
        
          /**Start of document list
           * @param info
           */
          public abstract void startDocumentList(Writer writer, DocListInfo info);
          /**Write out a document
           * @param solrDocument
           */
          public abstract void writeDoc(Writer writer,SolrDocument solrDocument);
        
          /**End of documents
           */
          public abstract void endDocumentList(Writer writer,);
        
          /**write the header if required
           * @param responseHeader
           */
          public abstract void writeResponseHeader(Writer writer,NamedList responseHeader);
        
          public abstract void end(Writer writer);
        

        1. You include a #writeDoc function. In my patch I called this #emitDoc. Why the name change?
        2. Same goes for #startDocumentList and #endDocumentList (called #emitHeader and #emitFooter in my patch). Why the name change?
        3. #start and #end are never called in your patch?
        4. The javadoc I included in my patch is not included in yours.
        5. My patch included a means to get the whole DocumentList (in the case that aggregate formatting is required) – this is removed in your patch. Your patch includes only the equivalent of my DocumentResponseWriter.
        6. The spirit of your patch is a bit more generic than mine, e.g., with the writeOther method. +1 to that.

        Let me take a crack at merging what you put up and what I wrote. Sound good?

        Show
        Chris A. Mattmann added a comment - - edited Most of the custom writers are less bothered about sections other than the DocList. The hard part is in reading the stored fields from lucene Documents depending on what fields are requested by the user. If the API allows to fetch the data as an Iterator<SolrDocument> w/o bothering about the low level Lucene details that would be ideal. This is exactly my point with this issue. I think that you and I are on the same page, Noble. I took a look at the patch you uploaded: public abstract void start(Writer writer); /**Start of document list * @param info */ public abstract void startDocumentList(Writer writer, DocListInfo info); /**Write out a document * @param solrDocument */ public abstract void writeDoc(Writer writer,SolrDocument solrDocument); /**End of documents */ public abstract void endDocumentList(Writer writer,); /**write the header if required * @param responseHeader */ public abstract void writeResponseHeader(Writer writer,NamedList responseHeader); public abstract void end(Writer writer); 1. You include a #writeDoc function. In my patch I called this #emitDoc. Why the name change? 2. Same goes for #startDocumentList and #endDocumentList (called #emitHeader and #emitFooter in my patch). Why the name change? 3. #start and #end are never called in your patch? 4. The javadoc I included in my patch is not included in yours. 5. My patch included a means to get the whole DocumentList (in the case that aggregate formatting is required) – this is removed in your patch. Your patch includes only the equivalent of my DocumentResponseWriter. 6. The spirit of your patch is a bit more generic than mine, e.g., with the writeOther method. +1 to that. Let me take a crack at merging what you put up and what I wrote. Sound good?
        Hide
        Noble Paul added a comment -

        I started of with a clean slate. I did not use your patch for reference. The method names are all arbitrary . header and footer are not familiar in Solr's context .So I did not use it. write*() or emit*() Ii am fine with both

        As I mentioned , it is an idea as a patch and not a real patch. The final form may completely differ (including javadocs).

        I shall put up another patch.

        Show
        Noble Paul added a comment - I started of with a clean slate. I did not use your patch for reference. The method names are all arbitrary . header and footer are not familiar in Solr's context .So I did not use it. write*() or emit*() Ii am fine with both As I mentioned , it is an idea as a patch and not a real patch. The final form may completely differ (including javadocs). I shall put up another patch.
        Hide
        Chris A. Mattmann added a comment -

        I started of with a clean slate. I did not use your patch for reference.

        Why? The end result of what you attached is very similar to the original I contributed (with the exception you used SolrInputDocument rather than o.a.l.Document – why? and you added a #writeObject method).

        The method names are all arbitrary . header and footer are not familiar in Solr's context .

        What is Solr's context? Is there a glossary of typical SOLR function names? I didn't see a pattern in the existing QueryResponseWriters, not that it would have been a big deal anyways since this is really a new set of abstract base classes by which to build Document-focused response writers from.

        So I did not use it. write*() or emit*() Ii am fine with both

        +1 for emit*() since that was the original intention, and since it matches the javadoc I spent time and effort to write.

        As I mentioned , it is an idea as a patch and not a real patch. The final form may completely differ (including javadocs).

        Why is that? Is there a big delta between what I contributed and something that meets the criteria for a patch?

        I appreciate your time in reviewing this patch with me.

        Cheers,
        Chris

        Show
        Chris A. Mattmann added a comment - I started of with a clean slate. I did not use your patch for reference. Why? The end result of what you attached is very similar to the original I contributed (with the exception you used SolrInputDocument rather than o.a.l.Document – why? and you added a #writeObject method). The method names are all arbitrary . header and footer are not familiar in Solr's context . What is Solr's context? Is there a glossary of typical SOLR function names? I didn't see a pattern in the existing QueryResponseWriters, not that it would have been a big deal anyways since this is really a new set of abstract base classes by which to build Document-focused response writers from. So I did not use it. write*() or emit*() Ii am fine with both +1 for emit*() since that was the original intention, and since it matches the javadoc I spent time and effort to write. As I mentioned , it is an idea as a patch and not a real patch. The final form may completely differ (including javadocs). Why is that? Is there a big delta between what I contributed and something that meets the criteria for a patch? I appreciate your time in reviewing this patch with me. Cheers, Chris
        Hide
        Noble Paul added a comment -

        next idea as patch. incorporating your suggestions

        Show
        Noble Paul added a comment - next idea as patch. incorporating your suggestions
        Hide
        Chris A. Mattmann added a comment -
        • updated patch:
        • merged my javadoc
        • cleaned up formatting
        • added javadoc

        My +1 to commit this...

        Show
        Chris A. Mattmann added a comment - updated patch: merged my javadoc cleaned up formatting added javadoc My +1 to commit this...
        Hide
        Noble Paul added a comment -

        committed r884037

        Thanks Mattman. removed author tags because Lucene has a policy against it

        Show
        Noble Paul added a comment - committed r884037 Thanks Mattman. removed author tags because Lucene has a policy against it
        Hide
        Chris A. Mattmann added a comment -

        Thanks, Noble!

        Show
        Chris A. Mattmann added a comment - Thanks, Noble!
        Hide
        Noble Paul added a comment -

        some API changes

        Show
        Noble Paul added a comment - some API changes
        Hide
        Noble Paul added a comment -

        it was not possible to write multiple DocList

        Show
        Noble Paul added a comment - it was not possible to write multiple DocList
        Hide
        Hoss Man added a comment -

        Correcting Fix Version based on CHANGES.txt, see this thread for more details...

        http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E

        Show
        Hoss Man added a comment - Correcting Fix Version based on CHANGES.txt, see this thread for more details... http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E
        Hide
        Grant Ingersoll added a comment -

        Bulk close for 3.1.0 release

        Show
        Grant Ingersoll added a comment - Bulk close for 3.1.0 release

          People

          • Assignee:
            Noble Paul
            Reporter:
            Chris A. Mattmann
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development