Solr
  1. Solr
  2. SOLR-5423

CSV output doesn't include function field

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.4
    • Fix Version/s: 4.7.1, 4.8, 6.0
    • Component/s: None
    • Labels:
      None

      Description

      Given a schema with

      <field name="price" type="float" indexed="true" stored="true"/>
      <field name='numpages' type='int' indexed='true' stored='true'/>

      the following query returns no rows:

      http://localhost:8983/solr/collection1/select?q=*%3A*&rows=30&fl=div(price%2Cnumpages)&wt=csv&indent=true

      However, setting wt=json or wt=xml, it works.

      1. SOLR-5423.patch
        9 kB
        Steve Rowe
      2. SOLR-5423.patch
        4 kB
        Arun Kumar

        Issue Links

          Activity

          Hide
          Arun Kumar added a comment -

          Fix for the csv output doesn't work for function fields

          Show
          Arun Kumar added a comment - Fix for the csv output doesn't work for function fields
          Hide
          Arun Kumar added a comment -

          I have fixed this issue. Attached the patch file for the fix.

          Show
          Arun Kumar added a comment - I have fixed this issue. Attached the patch file for the fix.
          Hide
          Hoss Man added a comment -

          Hey Arun, some comments about your patch:

          • please generate patches against he entire code base so they are easier to apply & review: https://wiki.apache.org/solr/HowToContribute#Generating_a_patch
          • can you explain what's going on with the getOriginalNameForFunctionField function in your patch? ... it seems extremely sketchy. if the problem is just that ValueSourceAugmenter's "getName" doesn't match the actual fieldname the transformer puts in the modified documents, we should fix that in ValueSourceAugmenter
          • in general, the amount of instanceof checking going on in your patch seems really brittle, and i'm not exactly sure why it's neccessary. As I understand it the CSVResponseWriter already loops over all of the field names in the documents to be returned to get the list of column names it should output – so as long as the transformer logic is applied before that, then won't the field names be picked up automatically?

          independent of any of the above questions about this patch, we shouldn't move forward with committing support for transformers to the csv repsonse writer w/o having some tests that prove it works.

          Show
          Hoss Man added a comment - Hey Arun, some comments about your patch: please generate patches against he entire code base so they are easier to apply & review: https://wiki.apache.org/solr/HowToContribute#Generating_a_patch can you explain what's going on with the getOriginalNameForFunctionField function in your patch? ... it seems extremely sketchy. if the problem is just that ValueSourceAugmenter's "getName" doesn't match the actual fieldname the transformer puts in the modified documents, we should fix that in ValueSourceAugmenter in general, the amount of instanceof checking going on in your patch seems really brittle, and i'm not exactly sure why it's neccessary. As I understand it the CSVResponseWriter already loops over all of the field names in the documents to be returned to get the list of column names it should output – so as long as the transformer logic is applied before that, then won't the field names be picked up automatically? independent of any of the above questions about this patch, we shouldn't move forward with committing support for transformers to the csv repsonse writer w/o having some tests that prove it works.
          Hide
          Arun Kumar added a comment -

          Generated the patch at the project root level with proper naming convention

          Show
          Arun Kumar added a comment - Generated the patch at the project root level with proper naming convention
          Hide
          Arun Kumar added a comment -

          Hi Hoss,

          Thanks for your review. I have attached the patch generated at the root level and followed the naming convention for the file.

          Here are my inputs on your other comments:

          The reason for including a new method getOriginalNameForFunctionField is the ValueSourceAugmenter.getName() returns the name with a function prefix to it, as like this.

          public String getName()

          { return "function("+name+")"; }

          We can't change this method directly as it has been referred in few other places in the code, it may break other places. Even I thought of including a new method in ValueSourceAugmenter as getOriginalName()

          {return name;}

          but for this we have to change the abstract class DocTransformer and should implement this new method in all the child classes which are extending DocTransformer. So I dropped this option and wrote a method specific to CSVResponseWriter.

          On number of instanceof used, in writeResponse method of CSVResponseWriter the responseObj type is coming as ResultContext only when we have functions in the query. so I am filtering the ResultContext instance and then the other two instanceof is required when one function is present in the query then the transformer type is of ValueSourceAugmenter and if we have multiple functions in the same query then the transformer type is DocTransformers (array of ValueSourceAugmenter) so we need to check these instanceof to process accordingly. In general the transformer logic followed for xml/json is different than the csv. In csv we don't have headers and the format also simple. Because of this difference the transformers are not considering the function field as a field for the csv type. Please let me know if there are any other better way to handle this.

          On the test case, yes I can write a unit test case to prove this fix works.

          Thanks,
          Arun

          Show
          Arun Kumar added a comment - Hi Hoss, Thanks for your review. I have attached the patch generated at the root level and followed the naming convention for the file. Here are my inputs on your other comments: The reason for including a new method getOriginalNameForFunctionField is the ValueSourceAugmenter.getName() returns the name with a function prefix to it, as like this. public String getName() { return "function("+name+")"; } We can't change this method directly as it has been referred in few other places in the code, it may break other places. Even I thought of including a new method in ValueSourceAugmenter as getOriginalName() {return name;} but for this we have to change the abstract class DocTransformer and should implement this new method in all the child classes which are extending DocTransformer. So I dropped this option and wrote a method specific to CSVResponseWriter. On number of instanceof used, in writeResponse method of CSVResponseWriter the responseObj type is coming as ResultContext only when we have functions in the query. so I am filtering the ResultContext instance and then the other two instanceof is required when one function is present in the query then the transformer type is of ValueSourceAugmenter and if we have multiple functions in the same query then the transformer type is DocTransformers (array of ValueSourceAugmenter) so we need to check these instanceof to process accordingly. In general the transformer logic followed for xml/json is different than the csv. In csv we don't have headers and the format also simple. Because of this difference the transformers are not considering the function field as a field for the csv type. Please let me know if there are any other better way to handle this. On the test case, yes I can write a unit test case to prove this fix works. Thanks, Arun
          Hide
          Arun Kumar added a comment -

          Attached the updated patch with a supporting unit test to test the fix.

          Show
          Arun Kumar added a comment - Attached the updated patch with a supporting unit test to test the fix.
          Hide
          Steve Rowe added a comment -

          Hi Arun,

          1. About getOriginalNameForFunctionField() - Hoss wrote:

          if the problem is just that ValueSourceAugmenter's "getName" doesn't match the actual fieldname the transformer puts in the modified documents, we should fix that in ValueSourceAugmenter

          and then you wrote:

          We can't change this method directly as it has been referred in few other places in the code, it may break other places.

          That's not a problem, you can fix them in those other places.

          2. About instanceof - Hoss wrote:

          in general, the amount of instanceof checking going on in your patch seems really brittle, and i'm not exactly sure why it's neccessary. As I understand it the CSVResponseWriter already loops over all of the field names in the documents to be returned to get the list of column names it should output – so as long as the transformer logic is applied before that, then won't the field names be picked up automatically?

          If you look at the loop Hoss describes, where fields are pulled from the response documents (starting at line #244 in your patched version of CSVResponseWriter.java), you shouldn't need to do any of the transformer interrogation you're doing at all. Maybe the condition around the loop needs to be expanded to include cases when there are function fields (returnFields.getTransformer() != null or something like that)?

          3. Unit tests: you added one case, but I'd suggest you add a few more, e.g. the one in this issue's description, where there is no field renaming.

          Show
          Steve Rowe added a comment - Hi Arun, 1. About getOriginalNameForFunctionField() - Hoss wrote: if the problem is just that ValueSourceAugmenter's "getName" doesn't match the actual fieldname the transformer puts in the modified documents, we should fix that in ValueSourceAugmenter and then you wrote: We can't change this method directly as it has been referred in few other places in the code, it may break other places. That's not a problem, you can fix them in those other places. 2. About instanceof - Hoss wrote: in general, the amount of instanceof checking going on in your patch seems really brittle, and i'm not exactly sure why it's neccessary. As I understand it the CSVResponseWriter already loops over all of the field names in the documents to be returned to get the list of column names it should output – so as long as the transformer logic is applied before that, then won't the field names be picked up automatically? If you look at the loop Hoss describes, where fields are pulled from the response documents (starting at line #244 in your patched version of CSVResponseWriter.java ), you shouldn't need to do any of the transformer interrogation you're doing at all. Maybe the condition around the loop needs to be expanded to include cases when there are function fields ( returnFields.getTransformer() != null or something like that)? 3. Unit tests: you added one case, but I'd suggest you add a few more, e.g. the one in this issue's description, where there is no field renaming.
          Hide
          Arun Kumar added a comment -

          Hi Steve,

          Thanks for your review, I have attached an updated patch.

          1. I removed the new method getOriginalNameForFunctionField() instead as you suggested I updated the getName() of ValueSourceAugmenter. There are no impacts because of this change, I tested the unit test case
          covering this ValueSourceAugmenter.

          2. Made correction as you & Hoss suggested

          3. Covered additional unit test without alias names and single/multiple functions in query

          -Arun

          Show
          Arun Kumar added a comment - Hi Steve, Thanks for your review, I have attached an updated patch. 1. I removed the new method getOriginalNameForFunctionField() instead as you suggested I updated the getName() of ValueSourceAugmenter. There are no impacts because of this change, I tested the unit test case covering this ValueSourceAugmenter. 2. Made correction as you & Hoss suggested 3. Covered additional unit test without alias names and single/multiple functions in query -Arun
          Hide
          Steve Rowe added a comment -

          Hi Arun,

          When you post patches, you don't need to remove older versions - in fact, you shouldn't - JIRA will gray out older same-named files, and people can see the evolution of proposed changes. You can see that with the patch file I'm attaching now - your most recent one and mine are both there. If you hover your mouse pointer over an attachment file name, you can see some metadata about it in the tooltip: who posted it and when.

          In looking at your patch, I noticed that your solution of appending fields by getting their names from the corresponding transformers would put fields in the wrong order if a function pseudofield is not last in the requested fields list. I reordered the requested fields in one of your tests to illustrate this problem.

          Hoss's and my comments about the get-all-fields-from-the-docs loop already having access to all fields was wrong, because the transformation logic happens later, at the end of CSVResponseWriter.writeResponse(), in the call to TextResponseWriter.writeDocuments(). Even if we had been right, appending fields in an order different from the requested order would have been wrong.

          So the field ordering problem has to be fixed based on the actual requested fields list. The attached patch handles this in SolrReturnFields, where the requested fields list is parsed, by putting function pseudofields into the (ordered) requested field names - at present, the only consumer of this is CSVWriter.writeResponse(). This solution doesn't need to change CSVResponseWriter itself at all. I also added a test to ReturnFieldsTest.

          I think this is ready to go.

          Show
          Steve Rowe added a comment - Hi Arun, When you post patches, you don't need to remove older versions - in fact, you shouldn't - JIRA will gray out older same-named files, and people can see the evolution of proposed changes. You can see that with the patch file I'm attaching now - your most recent one and mine are both there. If you hover your mouse pointer over an attachment file name, you can see some metadata about it in the tooltip: who posted it and when. In looking at your patch, I noticed that your solution of appending fields by getting their names from the corresponding transformers would put fields in the wrong order if a function pseudofield is not last in the requested fields list. I reordered the requested fields in one of your tests to illustrate this problem. Hoss's and my comments about the get-all-fields-from-the-docs loop already having access to all fields was wrong, because the transformation logic happens later, at the end of CSVResponseWriter.writeResponse() , in the call to TextResponseWriter.writeDocuments() . Even if we had been right, appending fields in an order different from the requested order would have been wrong. So the field ordering problem has to be fixed based on the actual requested fields list. The attached patch handles this in SolrReturnFields , where the requested fields list is parsed, by putting function pseudofields into the (ordered) requested field names - at present, the only consumer of this is CSVWriter.writeResponse() . This solution doesn't need to change CSVResponseWriter itself at all. I also added a test to ReturnFieldsTest . I think this is ready to go.
          Hide
          ASF subversion and git services added a comment -

          Commit 1571505 from Steve Rowe in branch 'dev/trunk'
          [ https://svn.apache.org/r1571505 ]

          SOLR-5423: CSV output doesn't include function field

          Show
          ASF subversion and git services added a comment - Commit 1571505 from Steve Rowe in branch 'dev/trunk' [ https://svn.apache.org/r1571505 ] SOLR-5423 : CSV output doesn't include function field
          Hide
          ASF subversion and git services added a comment -

          Commit 1571559 from Steve Rowe in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1571559 ]

          SOLR-5423: CSV output doesn't include function field (merged trunk r1571505)

          Show
          ASF subversion and git services added a comment - Commit 1571559 from Steve Rowe in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1571559 ] SOLR-5423 : CSV output doesn't include function field (merged trunk r1571505)
          Hide
          Steve Rowe added a comment -

          Committed to trunk and branch_4x.

          Show
          Steve Rowe added a comment - Committed to trunk and branch_4x.
          Hide
          Steve Rowe added a comment -

          I'll backport this to 4.7.1

          Show
          Steve Rowe added a comment - I'll backport this to 4.7.1
          Hide
          ASF subversion and git services added a comment -

          Commit 1580965 from Steve Rowe in branch 'dev/branches/lucene_solr_4_7'
          [ https://svn.apache.org/r1580965 ]

          SOLR-5423: CSV output doesn't include function field (merged branch_4x r1571559)

          Show
          ASF subversion and git services added a comment - Commit 1580965 from Steve Rowe in branch 'dev/branches/lucene_solr_4_7' [ https://svn.apache.org/r1580965 ] SOLR-5423 : CSV output doesn't include function field (merged branch_4x r1571559)
          Hide
          ASF subversion and git services added a comment -

          Commit 1580967 from Steve Rowe in branch 'dev/trunk'
          [ https://svn.apache.org/r1580967 ]

          SOLR-5423: move CHANGES.txt entry to 4.7.1 section

          Show
          ASF subversion and git services added a comment - Commit 1580967 from Steve Rowe in branch 'dev/trunk' [ https://svn.apache.org/r1580967 ] SOLR-5423 : move CHANGES.txt entry to 4.7.1 section
          Hide
          ASF subversion and git services added a comment -

          Commit 1580969 from Steve Rowe in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1580969 ]

          SOLR-5423: move CHANGES.txt entry to 4.7.1 section (merged trunk r1580967)

          Show
          ASF subversion and git services added a comment - Commit 1580969 from Steve Rowe in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1580969 ] SOLR-5423 : move CHANGES.txt entry to 4.7.1 section (merged trunk r1580967)
          Hide
          Steve Rowe added a comment -

          Bulk close 4.7.1 issues

          Show
          Steve Rowe added a comment - Bulk close 4.7.1 issues

            People

            • Assignee:
              Steve Rowe
              Reporter:
              James Wilson
            • Votes:
              2 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development