Solr
  1. Solr
  2. SOLR-2970

CSV ResponseWriter returns fields defined as stored=false in schema

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.6, 4.0-ALPHA
    • Component/s: Response Writers
    • Labels:
      None

      Description

      1. Add &wt=csv to a query
      2. You get columns for non-stored fields

      1. SOLR-2970.patch
        0.5 kB
        Jan Høydahl
      2. SOLR-2970.patch
        1 kB
        Jan Høydahl
      3. SOLR-2970.patch
        3 kB
        Jan Høydahl
      4. SOLR-2970.patch
        3 kB
        Jan Høydahl
      5. SOLR-2970-3x.patch
        3 kB
        Jan Høydahl
      6. SOLR-2970-3x-fixtest.patch
        2 kB
        Jan Høydahl

        Issue Links

          Activity

          Hide
          Jan Høydahl added a comment -

          This simple patch (for trunk) fixes it

          Show
          Jan Høydahl added a comment - This simple patch (for trunk) fixes it
          Hide
          Yonik Seeley added a comment -

          The other side of the coin is that if a client asks for x,y,z they may be expecting those columns in that order.

          IMO, the problem is more about CSV not having a native representation for null/missing (i.e. if you ask for a sparse field, you will get zero length strings for the missing values too).

          Show
          Yonik Seeley added a comment - The other side of the coin is that if a client asks for x,y,z they may be expecting those columns in that order. IMO, the problem is more about CSV not having a native representation for null/missing (i.e. if you ask for a sparse field, you will get zero length strings for the missing values too).
          Hide
          Jan Høydahl added a comment -

          The same solution seems to be valid for 3.x.
          I don't see any side effects of always skipping fields which are not stored?

          Show
          Jan Høydahl added a comment - The same solution seems to be valid for 3.x. I don't see any side effects of always skipping fields which are not stored?
          Hide
          Jan Høydahl added a comment -

          Take the example:

          Query: http://localhost:8983/solr/ac/select/?q=*%3A*&wt=csv
          id,subtext,textphon,textng,score,action,value,textnge,type,textsuggest,popularity
          a1,"Born 1898, author of Narnia",,,1023.9588,search,author_facet,,author,C.S. Lewis,256
          a2,Swedish fake author,,,799.97003,search,author_facet,,author,Carl Larsson,200
          a3,Norwegian famous author,,,359.992,search,author_facet,,author,Petter Dass,90
          

          Fields textphon, textng and textnge are not stored. After the patch, we get:

          Query: http://localhost:8983/solr/ac/select/?q=*%3A*&wt=csv
          id,subtext,score,action,value,type,textsuggest,popularity
          a1,"Born 1898, author of Narnia",1023.9588,search,author_facet,author,C.S. Lewis,256
          a2,Swedish fake author,799.97003,search,author_facet,author,Carl Larsson,200
          a3,Norwegian famous author,359.992,search,author_facet,author,Petter Dass,90
          

          You can still ask for specific fields in any order:

          Query: http://localhost:8983/solr/ac/select?q=*:*&wt=csv&fl=type,subtext,id
          type,subtext,id
          author,"Born 1898, author of Narnia",a1
          author,Swedish fake author,a2
          author,Norwegian famous author,a3
          

          Is it ever correct to return non-stored fields?

          Show
          Jan Høydahl added a comment - Take the example: Query: http://localhost:8983/solr/ac/select/?q=*%3A*&wt=csv id,subtext,textphon,textng,score,action,value,textnge,type,textsuggest,popularity a1,"Born 1898, author of Narnia",,,1023.9588,search,author_facet,,author,C.S. Lewis,256 a2,Swedish fake author,,,799.97003,search,author_facet,,author,Carl Larsson,200 a3,Norwegian famous author,,,359.992,search,author_facet,,author,Petter Dass,90 Fields textphon, textng and textnge are not stored. After the patch, we get: Query: http://localhost:8983/solr/ac/select/?q=*%3A*&wt=csv id,subtext,score,action,value,type,textsuggest,popularity a1,"Born 1898, author of Narnia",1023.9588,search,author_facet,author,C.S. Lewis,256 a2,Swedish fake author,799.97003,search,author_facet,author,Carl Larsson,200 a3,Norwegian famous author,359.992,search,author_facet,author,Petter Dass,90 You can still ask for specific fields in any order: Query: http://localhost:8983/solr/ac/select?q=*:*&wt=csv&fl=type,subtext,id type,subtext,id author,"Born 1898, author of Narnia",a1 author,Swedish fake author,a2 author,Norwegian famous author,a3 Is it ever correct to return non-stored fields?
          Hide
          Chris A. Mattmann added a comment -

          I ran into this when I was putting together the original patch for SOLR-1925, and believe that indeed the order of a field requested is important. And yes, as Yonik put it, CSV doesn't seem to have a native representation for null.

          What about a configuration parameter (perhaps attached to the request) to identify a "placeholder" for missing or non-stored values? We see this a lot in science data, when the data quality flag specifies that we should put in a fill value.

          Show
          Chris A. Mattmann added a comment - I ran into this when I was putting together the original patch for SOLR-1925 , and believe that indeed the order of a field requested is important. And yes, as Yonik put it, CSV doesn't seem to have a native representation for null. What about a configuration parameter (perhaps attached to the request) to identify a "placeholder" for missing or non-stored values? We see this a lot in science data, when the data quality flag specifies that we should put in a fill value.
          Hide
          Yonik Seeley added a comment -

          Query: http://localhost:8983/solr/ac/select/?q=*%3A*&wt=csv

          Ah, thanks for the example Jan. I didn't realize those non-stored fields were returned by default!

          Is it ever correct to return non-stored fields?

          If the user explicitly asks for the non-stored field, it can be OK to still return it. It's less a matter of correctness and more a matter of what's most useful.

          Show
          Yonik Seeley added a comment - Query: http://localhost:8983/solr/ac/select/?q=*%3A*&wt=csv Ah, thanks for the example Jan. I didn't realize those non-stored fields were returned by default! Is it ever correct to return non-stored fields? If the user explicitly asks for the non-stored field, it can be OK to still return it. It's less a matter of correctness and more a matter of what's most useful.
          Hide
          Jan Høydahl added a comment -

          So there are two distinct issues here
          A) CSV attempts to return fields which are not stored - which is bound to always return blank value
          B) When returning a stored field which has no value (null) for a document, there is no way to distinguish that from the empty string case

          I think B) should have its own JIRA.

          For A), I still cannot see any usecase where you want to ask for a non-stored field in the CSV output, even if you ask for it explicitly - since it will always be null/empty. If you ask the XML or JSON writer for a non-stored field, you never get it no matter what - it formally does not exist in this context.

          Show
          Jan Høydahl added a comment - So there are two distinct issues here A) CSV attempts to return fields which are not stored - which is bound to always return blank value B) When returning a stored field which has no value (null) for a document, there is no way to distinguish that from the empty string case I think B) should have its own JIRA. For A), I still cannot see any usecase where you want to ask for a non-stored field in the CSV output, even if you ask for it explicitly - since it will always be null/empty. If you ask the XML or JSON writer for a non-stored field, you never get it no matter what - it formally does not exist in this context.
          Hide
          Jan Høydahl added a comment -

          Created SOLR-2974 for B)

          I see that today you can explicitly specify non-existing fields in field-list and they will all end up in the response, e.g. &wt=csv&fl=id,type,foo,nonexisting

          If this is useful in some cases then let's continue to support it - coupled with the improvement in SOLR-2974
          But the default case when "fl" is not specified, or specified as "*", CSV writer should output only the stored fields.

          Show
          Jan Høydahl added a comment - Created SOLR-2974 for B) I see that today you can explicitly specify non-existing fields in field-list and they will all end up in the response, e.g. &wt=csv&fl=id,type,foo,nonexisting If this is useful in some cases then let's continue to support it - coupled with the improvement in SOLR-2974 But the default case when "fl" is not specified, or specified as "*", CSV writer should output only the stored fields.
          Hide
          Jan Høydahl added a comment -

          Updated patch which will respect explicit "fl" but output only stored fields for the default or "*" case

          Show
          Jan Høydahl added a comment - Updated patch which will respect explicit "fl" but output only stored fields for the default or "*" case
          Hide
          Jan Høydahl added a comment -

          Hmm, seems as the CSV writer does not support returning functions as fields either.
          http://localhost:8983/solr/ac/select/?q=*%3A*&wt=xml&fl=log(popularity) works but
          http://localhost:8983/solr/ac/select/?q=*%3A*&wt=csv&fl=log(popularity) does not
          But I can spin that off as a separate issue as well, and focus on fixing this one bug

          Show
          Jan Høydahl added a comment - Hmm, seems as the CSV writer does not support returning functions as fields either. http://localhost:8983/solr/ac/select/?q=*%3A*&wt=xml&fl=log(popularity ) works but http://localhost:8983/solr/ac/select/?q=*%3A*&wt=csv&fl=log(popularity ) does not But I can spin that off as a separate issue as well, and focus on fixing this one bug
          Hide
          Dan Hertz added a comment -

          When using fl=*, is it possible to include an option to output all the field data (stored and not stored)? I ask, because the CSV ResponseWriter is very useful to back up and/or transfer data between solr indexes and spreadsheets/databases.

          Perhaps default to stored fields, but provide an option to export all fields (without having to specify each field name)?

          Show
          Dan Hertz added a comment - When using fl=*, is it possible to include an option to output all the field data (stored and not stored)? I ask, because the CSV ResponseWriter is very useful to back up and/or transfer data between solr indexes and spreadsheets/databases. Perhaps default to stored fields, but provide an option to export all fields (without having to specify each field name)?
          Hide
          Jan Høydahl added a comment -

          Dan, the problem is that non-stored fields are not stored So they cannot be output. Even if it could be possible to un-invert indexed fields, that is, IMHO, not a task for the CSV writer.

          Show
          Jan Høydahl added a comment - Dan, the problem is that non-stored fields are not stored So they cannot be output. Even if it could be possible to un-invert indexed fields, that is, IMHO, not a task for the CSV writer.
          Hide
          Jan Høydahl added a comment -

          Updated patch with test case.
          Think this approaches ready for commit. Comments?

          Show
          Jan Høydahl added a comment - Updated patch with test case. Think this approaches ready for commit. Comments?
          Hide
          Jan Høydahl added a comment -

          Added CHANGES.txt entry to patch (assuming it will be committed in 3.x)

          Show
          Jan Høydahl added a comment - Added CHANGES.txt entry to patch (assuming it will be committed in 3.x)
          Hide
          Jan Høydahl added a comment -

          Patch for 3.x

          Show
          Jan Høydahl added a comment - Patch for 3.x
          Hide
          Jan Høydahl added a comment -

          Committed to trunk and 3x

          Show
          Jan Høydahl added a comment - Committed to trunk and 3x
          Hide
          Simon Willnauer added a comment -

          jan, I think we just had a failure related to this commit can you look into this: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x-java7/1570/

          Show
          Simon Willnauer added a comment - jan, I think we just had a failure related to this commit can you look into this: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x-java7/1570/
          Hide
          Jan Høydahl added a comment -

          Looking...

          Show
          Jan Høydahl added a comment - Looking...
          Hide
          Jan Høydahl added a comment -

          Committed the attached fix to the test. On my MacBook, the order of fields for "*" was different than on build server, so test passed locally..

          Show
          Jan Høydahl added a comment - Committed the attached fix to the test. On my MacBook, the order of fields for "*" was different than on build server, so test passed locally..
          Hide
          Jan Høydahl added a comment -

          Committed test fix to trunk, rev 1306025

          Show
          Jan Høydahl added a comment - Committed test fix to trunk, rev 1306025

            People

            • Assignee:
              Jan Høydahl
              Reporter:
              Jan Høydahl
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development