Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-9127

XLSX response writer - do we want it?

    Details

    • Type: New Feature
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.3
    • Component/s: Response Writers
    • Labels:
      None

      Description

      I recently open sourced an XLSX response writer based on solr 4.6 and apache poi.

      https://github.com/desultir/SolrXLSXResponseWriter

      Is this something the community would be interested in bringing into the solr codebase? I'm willing to put the work into porting it to solr5 and solr6 if the community is interested, happy to leave it as a plugin otherwise.

      1. 9127-xlsxresponsewriter.patch
        27 kB
        Tony Moriarty
      2. SOLR-9127.patch
        28 kB
        Noble Paul

        Issue Links

          Activity

          Hide
          noble.paul Noble Paul added a comment -

          How can we comment unless we know what it brings to the table. How about an example?

          Show
          noble.paul Noble Paul added a comment - How can we comment unless we know what it brings to the table. How about an example?
          Hide
          cpoerschke Christine Poerschke added a comment -

          Hello Tony Moriarty - thanks for proposing an additional response writer.

          I see that Apache POI is described as "the Java API for Microsoft Documents". Would be good to have a small example here in this ticket, perhaps similar to the https://cwiki.apache.org/confluence/display/solr/Response+Writers docs?

          In terms of code, the README says

          ... Edit the map in NiceMetadataNames.java to provide the XLSXResponseWriter a lookup for translating your ugly metadatanames into user friendly text strings. ...

          Wondering what that might look like as a non-plugin e.g.

          • would the mappings be configured for the response writer (similar to how XSLTResponseWriter configures a xsltCacheLifetimeSeconds setting)?
          • would the mappings be parameters to the response writer (similar to how CSVResponseWriter takes csv.separator and other parameters)?
          • something else?
          Show
          cpoerschke Christine Poerschke added a comment - Hello Tony Moriarty - thanks for proposing an additional response writer. I see that Apache POI is described as "the Java API for Microsoft Documents". Would be good to have a small example here in this ticket, perhaps similar to the https://cwiki.apache.org/confluence/display/solr/Response+Writers docs? In terms of code, the README says ... Edit the map in NiceMetadataNames.java to provide the XLSXResponseWriter a lookup for translating your ugly metadatanames into user friendly text strings. ... Wondering what that might look like as a non-plugin e.g. would the mappings be configured for the response writer (similar to how XSLTResponseWriter configures a xsltCacheLifetimeSeconds setting)? would the mappings be parameters to the response writer (similar to how CSVResponseWriter takes csv.separator and other parameters)? something else?
          Hide
          desultir Tony Moriarty added a comment -

          Sorry - I definitely should have provided more information. The functionality in many ways mirrors CSVResponseWriter so i've based my example on that.

          The XLSX response writer returns a list of documents in XLSX format. Other information that would normally be included in a response, such as facet information, is excluded.
          The XSLX response writer supports multi-valued fields, as well as pseudo-fields and prints a header row with "nice" names for each field.

          Column Width and mapping of metadata field names to nice names is configured for the responsewriter; this can be done in the solrconfig (I will implement this).

          <queryResponseWriter name="xlsx"
          class="org.apache.solr.request.XLSXResponseWriter">
          <lst name="columnWidths">
          <int name="id">8</int>
          <int name="name">128</int>
          ...
          </lst>
          <lst name="columnNames">
          <str name="id">Product ID</str>
          <str name="name">Product Name</str>
          ...
          </lst>
          </queryResponseWriter>

          The list of fields to be output is a parameter to the response writer, passed through by the requesthandler ie
          http://localhost:8983/solr/techproducts/select?q=ipod&fl=id,cat,name,popularity,price,score&wt=xlsx returns an XLSX file; screenshot here:
          http://i.imgur.com/PA6ihPy.png

          Formatting is currently fixed but POI provides many options - a future extension could be to expose this in responsewriter config.

          This will create a dependency on newest stable POI (3.14)
          https://poi.apache.org/

          Show
          desultir Tony Moriarty added a comment - Sorry - I definitely should have provided more information. The functionality in many ways mirrors CSVResponseWriter so i've based my example on that. The XLSX response writer returns a list of documents in XLSX format. Other information that would normally be included in a response, such as facet information, is excluded. The XSLX response writer supports multi-valued fields, as well as pseudo-fields and prints a header row with "nice" names for each field. Column Width and mapping of metadata field names to nice names is configured for the responsewriter; this can be done in the solrconfig (I will implement this). <queryResponseWriter name="xlsx" class="org.apache.solr.request.XLSXResponseWriter"> <lst name="columnWidths"> <int name="id">8</int> <int name="name">128</int> ... </lst> <lst name="columnNames"> <str name="id">Product ID</str> <str name="name">Product Name</str> ... </lst> </queryResponseWriter> The list of fields to be output is a parameter to the response writer, passed through by the requesthandler ie http://localhost:8983/solr/techproducts/select?q=ipod&fl=id,cat,name,popularity,price,score&wt=xlsx returns an XLSX file; screenshot here: http://i.imgur.com/PA6ihPy.png Formatting is currently fixed but POI provides many options - a future extension could be to expose this in responsewriter config. This will create a dependency on newest stable POI (3.14) https://poi.apache.org/
          Hide
          cpoerschke Christine Poerschke added a comment -

          ... create a dependency on newest stable POI (3.14)

          Looks like lucene/ivy-versions.properties currently has a dependency on 3.11 org.apache.poi version.

          org.apache.poi.version = 3.11
          /org.apache.poi/poi = ${org.apache.poi.version}
          /org.apache.poi/poi-ooxml = ${org.apache.poi.version}
          /org.apache.poi/poi-ooxml-schemas = ${org.apache.poi.version}
          /org.apache.poi/poi-scratchpad = ${org.apache.poi.version}
          
          Show
          cpoerschke Christine Poerschke added a comment - ... create a dependency on newest stable POI (3.14) Looks like lucene/ivy-versions.properties currently has a dependency on 3.11 org.apache.poi version. org.apache.poi.version = 3.11 /org.apache.poi/poi = ${org.apache.poi.version} /org.apache.poi/poi-ooxml = ${org.apache.poi.version} /org.apache.poi/poi-ooxml-schemas = ${org.apache.poi.version} /org.apache.poi/poi-scratchpad = ${org.apache.poi.version}
          Hide
          noble.paul Noble Paul added a comment -

          If this does not need any extra dependencies it is not a problem.

          Show
          noble.paul Noble Paul added a comment - If this does not need any extra dependencies it is not a problem.
          Hide
          desultir Tony Moriarty added a comment -

          fantastic; i've forked from master on github and i'll get to work

          Show
          desultir Tony Moriarty added a comment - fantastic; i've forked from master on github and i'll get to work
          Hide
          janhoy Jan Høydahl added a comment -

          I don't really see why this needs to be part of Solr, but rather belongs as part of the client application? I mean - end-users wearing suits (read: Excel-folks) should not need to query the Solr API directly anyway, should they?

          Excel can perfectly well open CSV files too, so if we provide a parameter to let the CSV writer add headers Content-type: text/csv and Content-Disposition: attachment; filename="solr-results.csv" Excel users could have the CSV file open directly in Excel, and provide other column names using the already existing feature fl=Product_ID:id,Product_Name:name....

          I imagine that if we start adding spreadsheet formatting options to such a component, there will be no end to the flow of wishes people have wrt colors, fonts, formatting etc. And what about OpenOffice users, not to mention Apple Numbers? They too need their own WT, don't they? Not to mention those who prefer a nice Word table....
          Because of this, for me it is -0 to include in Solr and +1 to leave it as a 3rd party plugin. Other committers may see it differently...

          Aside from that, wrt config, would it make sense to allow overriding settings per request, something like f.id.xlsx.colname=Product ID&f.name.xlsx.colname=Product Name&f.id.xlsx.colwidth=8&f.name.xlsx.colwidth=128?

          Show
          janhoy Jan Høydahl added a comment - I don't really see why this needs to be part of Solr, but rather belongs as part of the client application? I mean - end-users wearing suits (read: Excel-folks) should not need to query the Solr API directly anyway, should they? Excel can perfectly well open CSV files too, so if we provide a parameter to let the CSV writer add headers Content-type: text/csv and Content-Disposition: attachment; filename="solr-results.csv" Excel users could have the CSV file open directly in Excel, and provide other column names using the already existing feature fl=Product_ID:id,Product_Name:name... . I imagine that if we start adding spreadsheet formatting options to such a component, there will be no end to the flow of wishes people have wrt colors, fonts, formatting etc. And what about OpenOffice users, not to mention Apple Numbers? They too need their own WT, don't they? Not to mention those who prefer a nice Word table.... Because of this, for me it is -0 to include in Solr and +1 to leave it as a 3rd party plugin. Other committers may see it differently... Aside from that, wrt config, would it make sense to allow overriding settings per request, something like f.id.xlsx.colname=Product ID&f.name.xlsx.colname=Product Name&f.id.xlsx.colwidth=8&f.name.xlsx.colwidth=128 ?
          Hide
          noble.paul Noble Paul added a comment - - edited

          If the component is 100 lines of code, it is not going to add too much to the binaries that we ship. So, it's no big deal.

          Show
          noble.paul Noble Paul added a comment - - edited If the component is 100 lines of code, it is not going to add too much to the binaries that we ship. So, it's no big deal.
          Hide
          janhoy Jan Høydahl added a comment -

          I'm not worried about bytes, but about bloat and maintenance burden. I can see the utility for the component for development purposes, where you could manually export a result set to Excel for pretty browsing (you could achieve a similar goal with a Velocity or XSLT template or CSV wt btw), but I don't see how excel fits in production on the REST API end – machines talk to Solr, not end users...

          Show
          janhoy Jan Høydahl added a comment - I'm not worried about bytes, but about bloat and maintenance burden. I can see the utility for the component for development purposes, where you could manually export a result set to Excel for pretty browsing (you could achieve a similar goal with a Velocity or XSLT template or CSV wt btw), but I don't see how excel fits in production on the REST API end – machines talk to Solr, not end users...
          Hide
          noble.paul Noble Paul added a comment -

          I can see the utility for the component for development purposes, where you could manually export a result set to Excel for pretty browsing

          We can't ask users to jump through hoops to achieve mundane tasks. Most of the Excel users would not be inclined to write code

          machines talk to Solr, not end users.

          I'm not sure about that. A lot of users use Solr for data analysis. Excel is a great tool for that.

          I'm not worried about bytes, but about bloat and maintenance burden.

          The response writers are very small components. Even if there is a bug in the component it is not going to impact the rest of the system

          I may not use this , but I can totally see how it could be very useful to users.

          Show
          noble.paul Noble Paul added a comment - I can see the utility for the component for development purposes, where you could manually export a result set to Excel for pretty browsing We can't ask users to jump through hoops to achieve mundane tasks. Most of the Excel users would not be inclined to write code machines talk to Solr, not end users. I'm not sure about that. A lot of users use Solr for data analysis. Excel is a great tool for that. I'm not worried about bytes, but about bloat and maintenance burden. The response writers are very small components. Even if there is a bug in the component it is not going to impact the rest of the system I may not use this , but I can totally see how it could be very useful to users.
          Hide
          janhoy Jan Høydahl added a comment -

          Ok, don't let me stand in the way of progress

          Show
          janhoy Jan Høydahl added a comment - Ok, don't let me stand in the way of progress
          Hide
          desultir Tony Moriarty added a comment -

          Apologies for going silent on this; life got in the way. In my fork of github master i've implemented XLSXResponseWriter, and TestXLSXResponseWriter. All unit tests are passing and output looks good, i'll generate a pull request in a day or two

          Show
          desultir Tony Moriarty added a comment - Apologies for going silent on this; life got in the way. In my fork of github master i've implemented XLSXResponseWriter, and TestXLSXResponseWriter. All unit tests are passing and output looks good, i'll generate a pull request in a day or two
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user desultir opened a pull request:

          https://github.com/apache/lucene-solr/pull/59

          SOLR-9127: XLSXResponseWriter

          https://issues.apache.org/jira/browse/SOLR-9127

          New response writer and test case (TextXLSXResponseWriter)

          passes ant precommit

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/desultir/lucene-solr master

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/lucene-solr/pull/59.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #59


          commit 0cb68f3e81a9e9871fa3de216baa06910965e3d5
          Author: desultir <desultir@users.noreply.github.com>
          Date: 2016-06-28T06:41:28Z

          Merge pull request #1 from apache/master

          Pull updates from SOLR head

          commit 30baf43a6916cf86d3f41e154a5bcd941ad627b8
          Author: Tony Moriarty <tdm@qti.qualcomm.com>
          Date: 2016-06-28T06:41:59Z

          initial changes to support XLSX - POI dependency and initial commit of responsewriter

          commit 9df823bca7d454037a4b46dcb9aa56bbdeda84e1
          Author: Tony Moriarty <tdm@qti.qualcomm.com>
          Date: 2016-06-28T06:42:33Z

          Merge branch 'master' of github.com:desultir/lucene-solr

          commit 86d922854f9b14ec79fb8d5d7fd8c6bbb4c96b60
          Author: Tony Moriarty <tdm@qti.qualcomm.com>
          Date: 2016-06-29T06:19:01Z

          further patching for SOLR against 6.0 master

          commit 57c60e9f3e654c94dd60a823d02de36fbd07cc7d
          Author: desultir <desultir@users.noreply.github.com>
          Date: 2016-06-29T06:29:31Z

          Merge pull request #2 from apache/master

          pull more changes from solr head

          commit 4db64aeddb5458689fab1475248c29118b064f95
          Author: Tony Moriarty <tdm@qti.qualcomm.com>
          Date: 2016-07-14T05:11:21Z

          add missing dependency on xml-beans (already in lucene) and initial running version of test

          commit 4ed97f03e6a52aaac12c7ecbb5ce770d96f53ec7
          Author: Tony Moriarty <tdm@qti.qualcomm.com>
          Date: 2016-08-01T04:36:22Z

          take column widths/metanames from solrconfig; pull in relevant tests from TestCSVResponseWriter

          commit ce035ce77f8beb71cb059e9e097f8798c9eccbc5
          Author: desultir <desultir@users.noreply.github.com>
          Date: 2016-08-01T04:41:26Z

          Merge pull request #3 from apache/master

          pull from upstream finally

          commit 4fed3ab1591fd69073e80c4b576b996f44698b33
          Author: Tony Moriarty <tdm@qti.qualcomm.com>
          Date: 2016-08-01T04:48:52Z

          remove unused write-to-disk of xlsx output

          commit 30c93acfb5f1d22c6f0fcf7640946385957d270d
          Author: Tony Moriarty <tdm@qti.qualcomm.com>
          Date: 2016-08-01T05:00:17Z

          AL header on new file

          commit 8edcb84b482e3f9c5fa80cc8984d8a6271d6ceda
          Author: Tony Moriarty <tdm@qti.qualcomm.com>
          Date: 2016-08-01T05:15:24Z

          remove unused imports


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user desultir opened a pull request: https://github.com/apache/lucene-solr/pull/59 SOLR-9127 : XLSXResponseWriter https://issues.apache.org/jira/browse/SOLR-9127 New response writer and test case (TextXLSXResponseWriter) passes ant precommit You can merge this pull request into a Git repository by running: $ git pull https://github.com/desultir/lucene-solr master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/lucene-solr/pull/59.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #59 commit 0cb68f3e81a9e9871fa3de216baa06910965e3d5 Author: desultir <desultir@users.noreply.github.com> Date: 2016-06-28T06:41:28Z Merge pull request #1 from apache/master Pull updates from SOLR head commit 30baf43a6916cf86d3f41e154a5bcd941ad627b8 Author: Tony Moriarty <tdm@qti.qualcomm.com> Date: 2016-06-28T06:41:59Z initial changes to support XLSX - POI dependency and initial commit of responsewriter commit 9df823bca7d454037a4b46dcb9aa56bbdeda84e1 Author: Tony Moriarty <tdm@qti.qualcomm.com> Date: 2016-06-28T06:42:33Z Merge branch 'master' of github.com:desultir/lucene-solr commit 86d922854f9b14ec79fb8d5d7fd8c6bbb4c96b60 Author: Tony Moriarty <tdm@qti.qualcomm.com> Date: 2016-06-29T06:19:01Z further patching for SOLR against 6.0 master commit 57c60e9f3e654c94dd60a823d02de36fbd07cc7d Author: desultir <desultir@users.noreply.github.com> Date: 2016-06-29T06:29:31Z Merge pull request #2 from apache/master pull more changes from solr head commit 4db64aeddb5458689fab1475248c29118b064f95 Author: Tony Moriarty <tdm@qti.qualcomm.com> Date: 2016-07-14T05:11:21Z add missing dependency on xml-beans (already in lucene) and initial running version of test commit 4ed97f03e6a52aaac12c7ecbb5ce770d96f53ec7 Author: Tony Moriarty <tdm@qti.qualcomm.com> Date: 2016-08-01T04:36:22Z take column widths/metanames from solrconfig; pull in relevant tests from TestCSVResponseWriter commit ce035ce77f8beb71cb059e9e097f8798c9eccbc5 Author: desultir <desultir@users.noreply.github.com> Date: 2016-08-01T04:41:26Z Merge pull request #3 from apache/master pull from upstream finally commit 4fed3ab1591fd69073e80c4b576b996f44698b33 Author: Tony Moriarty <tdm@qti.qualcomm.com> Date: 2016-08-01T04:48:52Z remove unused write-to-disk of xlsx output commit 30c93acfb5f1d22c6f0fcf7640946385957d270d Author: Tony Moriarty <tdm@qti.qualcomm.com> Date: 2016-08-01T05:00:17Z AL header on new file commit 8edcb84b482e3f9c5fa80cc8984d8a6271d6ceda Author: Tony Moriarty <tdm@qti.qualcomm.com> Date: 2016-08-01T05:15:24Z remove unused imports
          Hide
          desultir Tony Moriarty added a comment -

          Quicker than expected: here's pull request 59
          https://github.com/apache/lucene-solr/pull/59

          Added a dependency from SOLR onto org.apache.poi and org.apache.xmlbeans - both of which Lucene already depended upon

          Show
          desultir Tony Moriarty added a comment - Quicker than expected: here's pull request 59 https://github.com/apache/lucene-solr/pull/59 Added a dependency from SOLR onto org.apache.poi and org.apache.xmlbeans - both of which Lucene already depended upon
          Hide
          janhoy Jan Høydahl added a comment -

          This patch adds POI to solr core, inside the solr-webapp, while the existing poi dependency was only in contrib/extraction. I'm not happy about this code in Solr at all, but at least make it be a new contrib instead of in core!

          The wt adds 694 lines of Java code and 597 lines of test config (that is quite much more than the 100-lines you were talking about Noble Paul)
          Tony Moriarty, you can easily shrink solrconfig-xlsxresponsewriter.xml to just the few lines needed to setup the wt, delete the rest and use defaults.

          Thinking about the configuration where you currently allow "columnWidths" and "columnNames". Instead of going down that route, adding new config such as color, height, font, borders etc in the future, why not simply take a file argument to a template.xlsx file, and let the first row be column headers, the second row be solr field names, and just clone the sheet and keep all other formatting?

          Show
          janhoy Jan Høydahl added a comment - This patch adds POI to solr core, inside the solr-webapp , while the existing poi dependency was only in contrib/extraction . I'm not happy about this code in Solr at all, but at least make it be a new contrib instead of in core! The wt adds 694 lines of Java code and 597 lines of test config (that is quite much more than the 100-lines you were talking about Noble Paul ) Tony Moriarty , you can easily shrink solrconfig-xlsxresponsewriter.xml to just the few lines needed to setup the wt, delete the rest and use defaults. Thinking about the configuration where you currently allow "columnWidths" and "columnNames". Instead of going down that route, adding new config such as color, height, font, borders etc in the future, why not simply take a file argument to a template.xlsx file, and let the first row be column headers, the second row be solr field names, and just clone the sheet and keep all other formatting?
          Hide
          dsmiley David Smiley added a comment -

          This patch adds POI to solr core, inside the solr-webapp, while the existing poi dependency was only in contrib/extraction.

          Adding a dependency on POI to solr-core is going too far, IMO. I was with Jan's "-0" vote on this but that was before seeing POI. Do we really need POI? Is there some XML we can emit for this feature instead? https://en.wikipedia.org/wiki/Office_Open_XML

          Alternatively, add this POI based WT to the extraction contrib, and maybe add some hook or another to get the WT registered automatically if it's on the classpath.

          Show
          dsmiley David Smiley added a comment - This patch adds POI to solr core, inside the solr-webapp, while the existing poi dependency was only in contrib/extraction. Adding a dependency on POI to solr-core is going too far, IMO. I was with Jan's "-0" vote on this but that was before seeing POI. Do we really need POI? Is there some XML we can emit for this feature instead? https://en.wikipedia.org/wiki/Office_Open_XML Alternatively, add this POI based WT to the extraction contrib, and maybe add some hook or another to get the WT registered automatically if it's on the classpath.
          Hide
          noble.paul Noble Paul added a comment -

          This patch adds POI to solr core, inside the solr-webapp, while the existing poi dependency was only in contrib/extraction.

          Adding POI dependency to solr core is not advisable. It should be OK to add this to contrib/extraction

          Show
          noble.paul Noble Paul added a comment - This patch adds POI to solr core, inside the solr-webapp, while the existing poi dependency was only in contrib/extraction. Adding POI dependency to solr core is not advisable. It should be OK to add this to contrib/extraction
          Hide
          janhoy Jan Høydahl added a comment -

          Putting it in extraction just because they share one jar is bad practice, it breaks the cohesion of that contrib. It should be a separate contrib, if any.

          I would (again) propose for this to be published as a jar plugin plugin in its own GitHub repo or elsewhere outside the ASF. Then, if it gets very popular and people for some reason want it into the core product, then we could have another discussion. This is the same adoption pattern that e.g. Solr Cell once took.

          Show
          janhoy Jan Høydahl added a comment - Putting it in extraction just because they share one jar is bad practice, it breaks the cohesion of that contrib. It should be a separate contrib, if any. I would (again) propose for this to be published as a jar plugin plugin in its own GitHub repo or elsewhere outside the ASF. Then, if it gets very popular and people for some reason want it into the core product, then we could have another discussion. This is the same adoption pattern that e.g. Solr Cell once took.
          Hide
          noble.paul Noble Paul added a comment -

          this to be published as a jar plugin plugin in its own GitHub repo or elsewhere outside the ASF

          That's useless for users. It won't be a part of the distro. There won't be any documentation in our ref guide. There will be no tickets opened in this JIRA.

          Show
          noble.paul Noble Paul added a comment - this to be published as a jar plugin plugin in its own GitHub repo or elsewhere outside the ASF That's useless for users. It won't be a part of the distro. There won't be any documentation in our ref guide. There will be no tickets opened in this JIRA.
          Hide
          desultir Tony Moriarty added a comment -

          Thanks for reviewing the PR everyone. So from your feedback what i propose as next steps are:

          • remove from core; add to the extraction contrib
          • shrink solrconfig-xlsxresponsewriter.xml to just the few lines needed to setup the wt, delete the rest and use defaults, trim some bloat from the two java files
          • add a hook to get the WT registered automatically if jar on the classpath

          As to the self-hosted JAR plugin option, that's been available (SOLR4.6 version but with an offer to update if there was interest) on my github for over a year now. A single person found it and that was after I started this thread.

          Happy to go with the consensus though; if it's decided that's the way forward i can repackage this SOLR6 compliant version.

          Show
          desultir Tony Moriarty added a comment - Thanks for reviewing the PR everyone. So from your feedback what i propose as next steps are: remove from core; add to the extraction contrib shrink solrconfig-xlsxresponsewriter.xml to just the few lines needed to setup the wt, delete the rest and use defaults, trim some bloat from the two java files add a hook to get the WT registered automatically if jar on the classpath As to the self-hosted JAR plugin option, that's been available (SOLR4.6 version but with an offer to update if there was interest) on my github for over a year now. A single person found it and that was after I started this thread. Happy to go with the consensus though; if it's decided that's the way forward i can repackage this SOLR6 compliant version.
          Hide
          desultir Tony Moriarty added a comment - - edited

          Hi everyone,

          I'm coming along with my solr contrib (within solrcell) version of XLSXResponseWriter. One issue i'm having is with David Smiley's suggestion to "add some hook or another to get the WT registered automatically if it's on the classpath"

          The two ways i know of registering a QueryResponseWriter are as part of DEFAULT_RESPONSE_WRITERS in core/../solr/core/SolrCore.java or an entry in solrconfig.xml.

          Is there code which is executed automatically which is perhaps SolrCoreAware when a contrib is on the classpath or is it all reliant on the contrib being imported in solrconfig.xml? If the latter then i may have to let this point go.

          Show
          desultir Tony Moriarty added a comment - - edited Hi everyone, I'm coming along with my solr contrib (within solrcell) version of XLSXResponseWriter. One issue i'm having is with David Smiley's suggestion to "add some hook or another to get the WT registered automatically if it's on the classpath" The two ways i know of registering a QueryResponseWriter are as part of DEFAULT_RESPONSE_WRITERS in core/../solr/core/SolrCore.java or an entry in solrconfig.xml. Is there code which is executed automatically which is perhaps SolrCoreAware when a contrib is on the classpath or is it all reliant on the contrib being imported in solrconfig.xml? If the latter then i may have to let this point go.
          Hide
          noble.paul Noble Paul added a comment -

          We can hardcode these things in the solrcore code

          Show
          noble.paul Noble Paul added a comment - We can hardcode these things in the solrcore code
          Hide
          desultir Tony Moriarty added a comment -

          OK, in that case my PR is ready for perusal again. Changes:

          • removed from core; add to the extraction contrib
          • removed solrconfig-xlsxresponsewriter.xml; added just the few lines needed to setup the wt to the extraction solrconfig.xml as a non-default writer
          • Trimmed some surplus tests from the test file

          It stands at 702 lines added now of which 419 are the writer itself.

          Show
          desultir Tony Moriarty added a comment - OK, in that case my PR is ready for perusal again. Changes: removed from core; add to the extraction contrib removed solrconfig-xlsxresponsewriter.xml; added just the few lines needed to setup the wt to the extraction solrconfig.xml as a non-default writer Trimmed some surplus tests from the test file It stands at 702 lines added now of which 419 are the writer itself.
          Hide
          noble.paul Noble Paul added a comment - - edited

          added the logic of automatically loading xlsx writer automatically if available in classpath

          I think it should be possible to pass the columnNames and {{columnWidths}}as request parameters. Then, the user never needs to configure anything

          example: columnWidth.id=3&columnName.id=ID

          Show
          noble.paul Noble Paul added a comment - - edited added the logic of automatically loading xlsx writer automatically if available in classpath I think it should be possible to pass the columnNames and {{columnWidths}}as request parameters. Then, the user never needs to configure anything example: columnWidth.id=3&columnName.id=ID
          Hide
          desultir Tony Moriarty added a comment -

          Thanks for the patch Noble Paul, I'll integrate it and also add columnNames and columnWidths as request parameters by field

          Show
          desultir Tony Moriarty added a comment - Thanks for the patch Noble Paul, I'll integrate it and also add columnNames and columnWidths as request parameters by field
          Hide
          desultir Tony Moriarty added a comment - - edited

          OK integrated. PR ready for review again. Update to the documentation (changed the solrconfig attribute names too):

          Column Width and mapping of metadata field names to nice names is configured for the responsewriter; this can be done in the solrconfig or through request params colname.foo and colwidth.foo.

          <queryResponseWriter name="xlsx"
          class="org.apache.solr.request.XLSXResponseWriter">
          <lst name="colwidths">
          <int name="id">8</int>
          <int name="name">128</int>
          ...
          </lst>
          <lst name="colnames">
          <str name="id">Product ID</str>
          <str name="name">Product Name</str>
          ...
          </lst>
          </queryResponseWriter>

          But on review i've noticed Jan's older suggestion of using field request parameters ie "f.id.xlsx.colname=Product ID" to do the renaming. Should I update it to use field params?

          Show
          desultir Tony Moriarty added a comment - - edited OK integrated. PR ready for review again. Update to the documentation (changed the solrconfig attribute names too): Column Width and mapping of metadata field names to nice names is configured for the responsewriter; this can be done in the solrconfig or through request params colname.foo and colwidth.foo. <queryResponseWriter name="xlsx" class="org.apache.solr.request.XLSXResponseWriter"> <lst name="colwidths"> <int name="id">8</int> <int name="name">128</int> ... </lst> <lst name="colnames"> <str name="id">Product ID</str> <str name="name">Product Name</str> ... </lst> </queryResponseWriter> But on review i've noticed Jan's older suggestion of using field request parameters ie "f.id.xlsx.colname=Product ID" to do the renaming. Should I update it to use field params?
          Hide
          noble.paul Noble Paul added a comment -

          let's get rid of the init params altogether. The rest looks fine. I don't think any ResponseWriter uses init params

          Show
          noble.paul Noble Paul added a comment - let's get rid of the init params altogether. The rest looks fine. I don't think any ResponseWriter uses init params
          Hide
          desultir Tony Moriarty added a comment - - edited

          OK, i stripped out support for solrconfig init params and updated the tests accordingly. The PR is ready for review again

          An additional note for the documentation that came up during testing:
          The maximum value for a colwidth, as per poi limitations, is 255

          Show
          desultir Tony Moriarty added a comment - - edited OK, i stripped out support for solrconfig init params and updated the tests accordingly. The PR is ready for review again An additional note for the documentation that came up during testing: The maximum value for a colwidth, as per poi limitations, is 255
          Hide
          noble.paul Noble Paul added a comment - - edited

          where is the pull request? if possible attach a patch

          Show
          noble.paul Noble Paul added a comment - - edited where is the pull request? if possible attach a patch
          Hide
          desultir Tony Moriarty added a comment -

          Apologies for not being clear: it's the same PR as originally linked in the thread at:
          https://github.com/apache/lucene-solr/pull/59

          The patch is noisy as it involves commits which were later undone insofar as i added things to solr core and then removed:
          https://patch-diff.githubusercontent.com/raw/apache/lucene-solr/pull/59.patch

          Will this noisy patch be an issue?

          Show
          desultir Tony Moriarty added a comment - Apologies for not being clear: it's the same PR as originally linked in the thread at: https://github.com/apache/lucene-solr/pull/59 The patch is noisy as it involves commits which were later undone insofar as i added things to solr core and then removed: https://patch-diff.githubusercontent.com/raw/apache/lucene-solr/pull/59.patch Will this noisy patch be an issue?
          Hide
          noble.paul Noble Paul added a comment -

          yes. it's noisy and I don't know what is relevant and what is not. attach a real patch file to this ticket.

          Show
          noble.paul Noble Paul added a comment - yes. it's noisy and I don't know what is relevant and what is not. attach a real patch file to this ticket.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user desultir closed the pull request at:

          https://github.com/apache/lucene-solr/pull/59

          Show
          githubbot ASF GitHub Bot added a comment - Github user desultir closed the pull request at: https://github.com/apache/lucene-solr/pull/59
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user desultir commented on the issue:

          https://github.com/apache/lucene-solr/pull/59

          closing to issue cleaner PR

          Show
          githubbot ASF GitHub Bot added a comment - Github user desultir commented on the issue: https://github.com/apache/lucene-solr/pull/59 closing to issue cleaner PR
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user desultir opened a pull request:

          https://github.com/apache/lucene-solr/pull/77

          SOLR-9127 XlsxResponseWriter

          https://issues.apache.org/jira/browse/SOLR-9127

          New response writer and test case (TextXLSXResponseWriter)

          passes ant precommit
          all work squashed into one commit for easy perusal

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/desultir/lucene-solr xlsx

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/lucene-solr/pull/77.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #77


          commit 05363c347bb39efb24919eeb474280ca0edff753
          Author: Tony Moriarty <tdm@qti.qualcomm.com>
          Date: 2016-09-06T05:34:46Z

          full squashed patch


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user desultir opened a pull request: https://github.com/apache/lucene-solr/pull/77 SOLR-9127 XlsxResponseWriter https://issues.apache.org/jira/browse/SOLR-9127 New response writer and test case (TextXLSXResponseWriter) passes ant precommit all work squashed into one commit for easy perusal You can merge this pull request into a Git repository by running: $ git pull https://github.com/desultir/lucene-solr xlsx Alternatively you can review and apply these changes as the patch at: https://github.com/apache/lucene-solr/pull/77.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #77 commit 05363c347bb39efb24919eeb474280ca0edff753 Author: Tony Moriarty <tdm@qti.qualcomm.com> Date: 2016-09-06T05:34:46Z full squashed patch
          Hide
          desultir Tony Moriarty added a comment - - edited

          added https://issues.apache.org/jira/secure/attachment/12827123/9127-xlsxresponsewriter.patch
          full patch as per PR
          https://github.com/apache/lucene-solr/pull/77

          replaces first patch (SOLR-9127.patch)

          all commits squashed into one for easy review

          Show
          desultir Tony Moriarty added a comment - - edited added https://issues.apache.org/jira/secure/attachment/12827123/9127-xlsxresponsewriter.patch full patch as per PR https://github.com/apache/lucene-solr/pull/77 replaces first patch ( SOLR-9127 .patch) all commits squashed into one for easy review
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1a61fb68588d489dfab966dc3e3a894cae7d5b5f in lucene-solr's branch refs/heads/master from Noble Paul
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=1a61fb6 ]

          SOLR-9127: Excel workbook (.xlsx) response writer. use 'wt=xlsx'

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1a61fb68588d489dfab966dc3e3a894cae7d5b5f in lucene-solr's branch refs/heads/master from Noble Paul [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=1a61fb6 ] SOLR-9127 : Excel workbook (.xlsx) response writer. use 'wt=xlsx'
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 932c436d07dd3fd2cef0f1cb8319d5953433687e in lucene-solr's branch refs/heads/branch_6x from Noble Paul
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=932c436 ]

          SOLR-9127: Excel workbook (.xlsx) response writer. use 'wt=xlsx'

          Show
          jira-bot ASF subversion and git services added a comment - Commit 932c436d07dd3fd2cef0f1cb8319d5953433687e in lucene-solr's branch refs/heads/branch_6x from Noble Paul [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=932c436 ] SOLR-9127 : Excel workbook (.xlsx) response writer. use 'wt=xlsx'
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1a61fb68588d489dfab966dc3e3a894cae7d5b5f in lucene-solr's branch refs/heads/apiv2 from Noble Paul
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=1a61fb6 ]

          SOLR-9127: Excel workbook (.xlsx) response writer. use 'wt=xlsx'

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1a61fb68588d489dfab966dc3e3a894cae7d5b5f in lucene-solr's branch refs/heads/apiv2 from Noble Paul [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=1a61fb6 ] SOLR-9127 : Excel workbook (.xlsx) response writer. use 'wt=xlsx'
          Hide
          ctargett Cassandra Targett added a comment -

          I'm trying to add documentation for this to the Solr Ref Guide, but am not able to get it to work (I don't think). When I add 'wt=xlsx' to a request, the response looks exactly like the default xml. Maybe that's fine - I know XLSX is XML - but it would seem it should be different. I tried to add the example snippet above to solrconfig.xml, but got errors when I reloaded the core. I played out with the classname and still didn't work.

          So, what config does someone need to use this? If it's supposed to work without any config, what is the response supposed to look like?

          Show
          ctargett Cassandra Targett added a comment - I'm trying to add documentation for this to the Solr Ref Guide, but am not able to get it to work (I don't think). When I add 'wt=xlsx' to a request, the response looks exactly like the default xml. Maybe that's fine - I know XLSX is XML - but it would seem it should be different. I tried to add the example snippet above to solrconfig.xml, but got errors when I reloaded the core. I played out with the classname and still didn't work. So, what config does someone need to use this? If it's supposed to work without any config, what is the response supposed to look like?
          Hide
          dsmiley David Smiley added a comment -

          Quick comment: you need Tika on the classpath, well POI specifically. That's the main limitation/gotcha.

          Show
          dsmiley David Smiley added a comment - Quick comment: you need Tika on the classpath, well POI specifically. That's the main limitation/gotcha.
          Hide
          noble.paul Noble Paul added a comment - - edited

          It requires all of extraction/lib/*.jar and dist/solr-cell-6.3.0.jar in the server classpath. it's not enough to have it in the core classpath.

          Show
          noble.paul Noble Paul added a comment - - edited It requires all of extraction/lib/*.jar and dist/solr-cell-6.3.0.jar in the server classpath. it's not enough to have it in the core classpath.
          Hide
          shalinmangar Shalin Shekhar Mangar added a comment -

          Closing after 6.3.0 release.

          Show
          shalinmangar Shalin Shekhar Mangar added a comment - Closing after 6.3.0 release.

            People

            • Assignee:
              noble.paul Noble Paul
              Reporter:
              desultir Tony Moriarty
            • Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development