Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.2
    • Component/s: None
    • Labels:
      None

      Description

      A way to efficiently load simple formatted text files, including CSV files.

      1. csv.patch
        26 kB
        Yonik Seeley
      2. commons-csv-20061121.jar
        24 kB
        Yonik Seeley
      3. csv.patch
        28 kB
        Yonik Seeley

        Issue Links

          Activity

          Yonik Seeley created issue -
          Hide
          Yonik Seeley added a comment -

          To most efficiently load a large export file from some other datasource, we should allow an upload of a local file and avoid going through the network/servlet container for each individual record.

          Way to slurp a local file: a post to /solr/upload/localfile?file=full_path_to_file
          Parameter Ideas:
          file #the full path name of the local file
          separator=, #the field separator
          charset=utf8
          commit=false #do a commit after finished
          fieldnames=foo,bar,baz #define field names, if not taken from the header
          header=true #read field names from the header
          skip=baz #fields to skip
          skiplines=1 #number of lines at the start of the file to skip
          skipempty=true #don't index zero length values
          trim=true #trim whitespace from field values

          map=Yes:true #map a field value

          #per-field params
          f.myfield.map=No:false

          escape=true #backslash escaping
          quotedfields=true #optionally quoted fields (like CVS)

          csv=true #sets options correctly for a CSV file

          #support comments in the file?

          Show
          Yonik Seeley added a comment - To most efficiently load a large export file from some other datasource, we should allow an upload of a local file and avoid going through the network/servlet container for each individual record. Way to slurp a local file: a post to /solr/upload/localfile?file=full_path_to_file Parameter Ideas: file #the full path name of the local file separator=, #the field separator charset=utf8 commit=false #do a commit after finished fieldnames=foo,bar,baz #define field names, if not taken from the header header=true #read field names from the header skip=baz #fields to skip skiplines=1 #number of lines at the start of the file to skip skipempty=true #don't index zero length values trim=true #trim whitespace from field values map=Yes:true #map a field value #per-field params f.myfield.map=No:false escape=true #backslash escaping quotedfields=true #optionally quoted fields (like CVS) csv=true #sets options correctly for a CSV file #support comments in the file?
          Hide
          Yonik Seeley added a comment -

          Should we also somehow support uploading a text file over the network?
          HTTP-POST of the file would be the obvious thing, but then how are parameters specified.
          Multi-part post?

          Show
          Yonik Seeley added a comment - Should we also somehow support uploading a text file over the network? HTTP-POST of the file would be the obvious thing, but then how are parameters specified. Multi-part post?
          Hide
          Erik Hatcher added a comment -

          What about having an XSL transformation on the input to Solr as well? This would allow someone to POST in XML documents of any variety, but an XSL would turn it into the field definitions. This would certainly increase the appeal of Solr in my (library) domain - a standard TEI -> Solr stylesheet would allow folks to POST into Solr without doing much on the client end at all.

          Show
          Erik Hatcher added a comment - What about having an XSL transformation on the input to Solr as well? This would allow someone to POST in XML documents of any variety, but an XSL would turn it into the field definitions. This would certainly increase the appeal of Solr in my (library) domain - a standard TEI -> Solr stylesheet would allow folks to POST into Solr without doing much on the client end at all.
          Hide
          Fuad Efendi added a comment -

          Encoding:
          How to encode 'comma'?
          How to encode UTF-8?
          Should we use Base64 and encode raw values?

          http://rfc.net/rfc4180.html:
          "Common usage of CSV is US-ASCII, but other character sets defined by IANA for the "text" tree may be used in conjunction with the "charset" parameter.

          http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm
          http://www.edoceo.com/utilis/csv-file-format.php
          http://www.ricebridge.com/products/csvman/reference.htm

          This is interesting (from last link):
          FIELD: [trim]? ( UNQUOTED | QUOTED ) [trim]?
          UNQUOTED: ( [data]* | ESCAPE )*;
          QUOTED: [quote] ( DOUBLE | ESCAPE | [data]* )* [quote]

          Show
          Fuad Efendi added a comment - Encoding: How to encode 'comma'? How to encode UTF-8? Should we use Base64 and encode raw values? http://rfc.net/rfc4180.html: "Common usage of CSV is US-ASCII, but other character sets defined by IANA for the "text" tree may be used in conjunction with the "charset" parameter. http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm http://www.edoceo.com/utilis/csv-file-format.php http://www.ricebridge.com/products/csvman/reference.htm This is interesting (from last link): FIELD: [trim] ? ( UNQUOTED | QUOTED ) [trim] ? UNQUOTED: ( [data] * | ESCAPE )*; QUOTED: [quote] ( DOUBLE | ESCAPE | [data] * )* [quote]
          Hide
          Fuad Efendi added a comment -

          /sorry for not having access to E-mail and using POST temporarily.../

          HTTP-POST: should work without any code changes.

          In /resources/admin/index.jsp, <form name=queryForm method="GET" action="../select/"> Simply replace GET to POST, and everything should work ...

          You have following in org.apache.solr.servlet.SolrServlet:
          public void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException

          { doGet(request,response); }

          And, you are using standard Servlet API to retrieve ServletRequest parameters,
          http://java.sun.com/j2ee/1.4/docs/api/javax/servlet/ServletRequest.html#getParameterMap()

          public class ServletSolrParams extends MultiMapSolrParams {
          public ServletSolrParams(ServletRequest req)

          { super(req.getParameterMap()); }

          Existing SOLR should work with POST HTML forms without any change in Java...

          Show
          Fuad Efendi added a comment - /sorry for not having access to E-mail and using POST temporarily.../ HTTP-POST: should work without any code changes. In /resources/admin/index.jsp, <form name=queryForm method="GET" action="../select/"> Simply replace GET to POST, and everything should work ... You have following in org.apache.solr.servlet.SolrServlet: public void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { doGet(request,response); } And, you are using standard Servlet API to retrieve ServletRequest parameters, http://java.sun.com/j2ee/1.4/docs/api/javax/servlet/ServletRequest.html#getParameterMap( ) public class ServletSolrParams extends MultiMapSolrParams { public ServletSolrParams(ServletRequest req) { super(req.getParameterMap()); } Existing SOLR should work with POST HTML forms without any change in Java...
          Hide
          Yonik Seeley added a comment -

          > Existing SOLR should work with POST HTML forms without any change in Java...

          Yes, posting queries work because it's all form-data (query args).
          But, what if we want to post a complete file, and some extra info/parameters about how that file should be handled?

          Show
          Yonik Seeley added a comment - > Existing SOLR should work with POST HTML forms without any change in Java... Yes, posting queries work because it's all form-data (query args). But, what if we want to post a complete file, and some extra info/parameters about how that file should be handled?
          Hide
          Hoss Man added a comment -

          Fuad: the issue isn't really wether POSTed queries work ... those have been tested and are known to work ... it's more a question of POSTed updates ... the current update mechanism does not use "application/x-www-form-urlencoded" instead the raw POST body is read as an XML message containing docs to index.

          This issue is attempting to address a more convininet method to bulk import records, possibly using CSV, and probably using a local file – but we'd want to support a POSTed file as well, so there was some discussion (on list) of how to POST both a file nd send query params (using either "application/x-www-form-urlencoded" or the mechanism we currently use)

          Show
          Hoss Man added a comment - Fuad: the issue isn't really wether POSTed queries work ... those have been tested and are known to work ... it's more a question of POSTed updates ... the current update mechanism does not use "application/x-www-form-urlencoded" instead the raw POST body is read as an XML message containing docs to index. This issue is attempting to address a more convininet method to bulk import records, possibly using CSV, and probably using a local file – but we'd want to support a POSTed file as well, so there was some discussion (on list) of how to POST both a file nd send query params (using either "application/x-www-form-urlencoded" or the mechanism we currently use)
          Hide
          Yonik Seeley added a comment -

          > How to encode 'comma'?

          For standard CSV, ytou could quote the entire field value... "a,b"
          I don't know if Commons CSV supports backslash escaping or not, but that would be another way.

          > How to encode UTF-8?

          Two ways... the user can define a charset for the file (and the file could actually be UTF-8),
          and we can support unicode escapes \u1234

          > Should we use Base64 and encode raw values?

          I hadn't thought about binary fields (they aren't even supported in the XML update yet).
          Doing Base64 would seem relatively easy though.

          Show
          Yonik Seeley added a comment - > How to encode 'comma'? For standard CSV, ytou could quote the entire field value... "a,b" I don't know if Commons CSV supports backslash escaping or not, but that would be another way. > How to encode UTF-8? Two ways... the user can define a charset for the file (and the file could actually be UTF-8), and we can support unicode escapes \u1234 > Should we use Base64 and encode raw values? I hadn't thought about binary fields (they aren't even supported in the XML update yet). Doing Base64 would seem relatively easy though.
          Hide
          Fuad Efendi added a comment -

          Sorry for not correctly understanding the multipart HTTP POST / File Upload issue, it's not easy, I just browsed sources of org.springframework.web.multipart.support (although it's very easy with Spring...)

          Show
          Fuad Efendi added a comment - Sorry for not correctly understanding the multipart HTTP POST / File Upload issue, it's not easy, I just browsed sources of org.springframework.web.multipart.support (although it's very easy with Spring...)
          Hide
          Fuad Efendi added a comment -

          CSV:

          • should we support standard CSVs generated by Excel, Oracle DataPump, etc?

          XML: we currently preprocess some data to create XML, then we post it to SOLR.

          Can we preprocess standard CSV? For instance, we have two tables: CATEGORY (parent), PRODUCT (child)
          CSV produced by Oracle might seem like

          001,IBM,001,17R7021,14 7/8 X 8 1/2" - 1/2" Greenbar
          001,IBM,002,17R8018,8 1/2 x 11" Micro Perf @ 3 2/3"

          Here, [001,Paper] is a single record from CATEGORY table, and rest is PK, SKU, NAME from PRODUCT table.

          1. Use 'extended' CSV such as
          001,Paper,multi-value:"001,17R7021,14 7/8 X 8 1/2"" - 1/2"" Greenbar002,17R8018,8 1/2 x 11"" Micro Perf @ 3 2/3"""
          (multi-value:"<comma separated>,...")

          • very difficult... and not compatible with exported data...

          2. Standard CSV with fixed width + preprocessing (sorting, and removing repeated values)

          001,Paper,001,17R7021,14 7/8 X 8 1/2" - 1/2" Greenbar
          001,,002,17R8018,8 1/2 x 11" Micro Perf @ 3 2/3"

          We removed repeated value 'Paper', but we left Primary Key of this Category intact... It should work with both, standard 'large' CSV and preprocessed one... And, we don't have huge single line in case of IBM producing different kinds of paper...; we have multi-line with fixed width... First column (repeated 001 value) is primary key, same as <field name="id">001</field>

          Show
          Fuad Efendi added a comment - CSV: should we support standard CSVs generated by Excel, Oracle DataPump, etc? XML: we currently preprocess some data to create XML, then we post it to SOLR. Can we preprocess standard CSV? For instance, we have two tables: CATEGORY (parent), PRODUCT (child) CSV produced by Oracle might seem like 001,IBM,001,17R7021,14 7/8 X 8 1/2" - 1/2" Greenbar 001,IBM,002,17R8018,8 1/2 x 11" Micro Perf @ 3 2/3" Here, [001,Paper] is a single record from CATEGORY table, and rest is PK, SKU, NAME from PRODUCT table. 1. Use 'extended' CSV such as 001,Paper,multi-value:"001,17R7021,14 7/8 X 8 1/2"" - 1/2"" Greenbar002,17R8018,8 1/2 x 11"" Micro Perf @ 3 2/3""" (multi-value:"<comma separated>,...") very difficult... and not compatible with exported data... 2. Standard CSV with fixed width + preprocessing (sorting, and removing repeated values) 001,Paper,001,17R7021,14 7/8 X 8 1/2" - 1/2" Greenbar 001,,002,17R8018,8 1/2 x 11" Micro Perf @ 3 2/3" We removed repeated value 'Paper', but we left Primary Key of this Category intact... It should work with both, standard 'large' CSV and preprocessed one... And, we don't have huge single line in case of IBM producing different kinds of paper...; we have multi-line with fixed width... First column (repeated 001 value) is primary key, same as <field name="id">001</field>
          Hide
          Fuad Efendi added a comment -

          mistake... (Paper, instead of IBM):

          001, Paper, 001, 17R7021, 14 7/8 X 8 1/2" - 1/2" Greenbar
          001, Paper, 002, 17R8018, 8 1/2 x 11" Micro Perf @ 3 2/3"
          ...

          optimized:
          001, Paper, 001, 17R7021, 14 7/8 X 8 1/2" - 1/2" Greenbar
          001, , 002, 17R8018, 8 1/2 x 11" Micro Perf @ 3 2/3"
          ...

          Show
          Fuad Efendi added a comment - mistake... (Paper, instead of IBM): 001, Paper, 001, 17R7021, 14 7/8 X 8 1/2" - 1/2" Greenbar 001, Paper, 002, 17R8018, 8 1/2 x 11" Micro Perf @ 3 2/3" ... optimized: 001, Paper, 001, 17R7021, 14 7/8 X 8 1/2" - 1/2" Greenbar 001, , 002, 17R8018, 8 1/2 x 11" Micro Perf @ 3 2/3" ...
          Hide
          Fuad Efendi added a comment -

          Another sample...

          <add><doc>
          <field name="id">9885A004</field>
          <field name="name">Canon PowerShot SD500</field>
          <field name="manu">Canon Inc.</field>
          <field name="cat">electronics</field>
          <field name="cat">camera</field>
          <field name="features">3x zoop, 7.1 megapixel Digital ELPH</field>
          <field name="features">movie clips up to 640x480 @30 fps</field>
          <field name="features">2.0" TFT LCD, 118,000 pixels</field>
          <field name="features">built in flash, red-eye reduction</field>
          <field name="includes">32MB SD card, USB cable, AV cable, battery</field>
          <field name="weight">6.4</field>
          <field name="price">329.95</field>
          <field name="popularity">7</field>
          <field name="inStock">true</field>
          </doc></add>

          Carthesian Product of Facets (can I cay that?):
          ====================================
          9885A004, Canon PowerShot SD500, Canon Inc., electronics, 3x zoop, 7.1 megapixel Digital ELPH, "32MB SD card, USB cable, AV cable, battery", 6.4, 329.95, 7, true
          9885A004, Canon PowerShot SD500, Canon Inc., electronics, 3x zoop, movie clips up to 640x480 @30 fps, "32MB SD card, USB cable, AV cable, battery", 6.4, 329.95, 7, true
          9885A004, Canon PowerShot SD500, Canon Inc., electronics, 3x zoop, 2.0" TFT LCD, 118,000 pixels, "32MB SD card, USB cable, AV cable, battery", 6.4, 329.95, 7, true
          9885A004, Canon PowerShot SD500, Canon Inc., electronics, 3x zoop, built in flash, red-eye reduction, "32MB SD card, USB cable, AV cable, battery", 6.4, 329.95, 7, true
          9885A004, Canon PowerShot SD500, Canon Inc., camera, 3x zoop, 7.1 megapixel Digital ELPH, "32MB SD card, USB cable, AV cable, battery", 6.4, 329.95, 7, true
          9885A004, Canon PowerShot SD500, Canon Inc., camera, 3x zoop, movie clips up to 640x480 @30 fps, "32MB SD card, USB cable, AV cable, battery", 6.4, 329.95, 7, true
          9885A004, Canon PowerShot SD500, Canon Inc., camera, 3x zoop, 2.0" TFT LCD, 118,000 pixels, "32MB SD card, USB cable, AV cable, battery", 6.4, 329.95, 7, true
          9885A004, Canon PowerShot SD500, Canon Inc., camera, 3x zoop, built in flash, red-eye reduction, "32MB SD card, USB cable, AV cable, battery", 6.4, 329.95, 7, true

          Optimized CSV (just for improved network traffic for areas without DSL!!!):
          9885A004, Canon PowerShot SD500, Canon Inc., electronics, "3x zoop, 7.1 megapixel Digital ELPH", "32MB SD card, USB cable, AV cable, battery", 6.4, 329.95, 7 true
          9885A004, , , , "movie clips up to 640x480 @30 fps", , , , ,
          9885A004, , , , "3x zoop, 2.0" TFT LCD, 118,000 pixels", , , , ,
          9885A004, , , , "built in flash, red-eye reduction", , , , ,
          9885A004, , , camera, "3x zoop, 7.1 megapixel Digital ELPH",
          9885A004, , , , "movie clips up to 640x480 @30 fps", , , , ,
          9885A004, , , , "3x zoop, 2.0" TFT LCD, 118,000 pixels", , , , ,
          9885A004, , , , "built in flash, red-eye reduction", , , , ,

          almost EDI... XML looks much better... May be specific GZIP version for a standard "Carthesian" CSV?

          Show
          Fuad Efendi added a comment - Another sample... <add><doc> <field name="id">9885A004</field> <field name="name">Canon PowerShot SD500</field> <field name="manu">Canon Inc.</field> <field name="cat">electronics</field> <field name="cat">camera</field> <field name="features">3x zoop, 7.1 megapixel Digital ELPH</field> <field name="features">movie clips up to 640x480 @30 fps</field> <field name="features">2.0" TFT LCD, 118,000 pixels</field> <field name="features">built in flash, red-eye reduction</field> <field name="includes">32MB SD card, USB cable, AV cable, battery</field> <field name="weight">6.4</field> <field name="price">329.95</field> <field name="popularity">7</field> <field name="inStock">true</field> </doc></add> Carthesian Product of Facets (can I cay that?): ==================================== 9885A004, Canon PowerShot SD500, Canon Inc., electronics, 3x zoop, 7.1 megapixel Digital ELPH, "32MB SD card, USB cable, AV cable, battery", 6.4, 329.95, 7, true 9885A004, Canon PowerShot SD500, Canon Inc., electronics, 3x zoop, movie clips up to 640x480 @30 fps, "32MB SD card, USB cable, AV cable, battery", 6.4, 329.95, 7, true 9885A004, Canon PowerShot SD500, Canon Inc., electronics, 3x zoop, 2.0" TFT LCD, 118,000 pixels, "32MB SD card, USB cable, AV cable, battery", 6.4, 329.95, 7, true 9885A004, Canon PowerShot SD500, Canon Inc., electronics, 3x zoop, built in flash, red-eye reduction, "32MB SD card, USB cable, AV cable, battery", 6.4, 329.95, 7, true 9885A004, Canon PowerShot SD500, Canon Inc., camera, 3x zoop, 7.1 megapixel Digital ELPH, "32MB SD card, USB cable, AV cable, battery", 6.4, 329.95, 7, true 9885A004, Canon PowerShot SD500, Canon Inc., camera, 3x zoop, movie clips up to 640x480 @30 fps, "32MB SD card, USB cable, AV cable, battery", 6.4, 329.95, 7, true 9885A004, Canon PowerShot SD500, Canon Inc., camera, 3x zoop, 2.0" TFT LCD, 118,000 pixels, "32MB SD card, USB cable, AV cable, battery", 6.4, 329.95, 7, true 9885A004, Canon PowerShot SD500, Canon Inc., camera, 3x zoop, built in flash, red-eye reduction, "32MB SD card, USB cable, AV cable, battery", 6.4, 329.95, 7, true Optimized CSV (just for improved network traffic for areas without DSL!!!): 9885A004, Canon PowerShot SD500, Canon Inc., electronics, "3x zoop, 7.1 megapixel Digital ELPH", "32MB SD card, USB cable, AV cable, battery", 6.4, 329.95, 7 true 9885A004, , , , "movie clips up to 640x480 @30 fps", , , , , 9885A004, , , , "3x zoop, 2.0" TFT LCD, 118,000 pixels", , , , , 9885A004, , , , "built in flash, red-eye reduction", , , , , 9885A004, , , camera, "3x zoop, 7.1 megapixel Digital ELPH", 9885A004, , , , "movie clips up to 640x480 @30 fps", , , , , 9885A004, , , , "3x zoop, 2.0" TFT LCD, 118,000 pixels", , , , , 9885A004, , , , "built in flash, red-eye reduction", , , , , almost EDI... XML looks much better... May be specific GZIP version for a standard "Carthesian" CSV?
          Hide
          Fuad Efendi added a comment -

          This is probably SOLR-specific (the best? be focused on task?)...

          We stick on 4-column format for everything (in case of surrogate PK we may have [id,"001,003,abxc"]):

          id,9885A004,name,Canon PowerShot SD500
          id,9885A004,manu,Canon Inc.
          id,9885A004,cat,electronics</field>
          id,9885A004,cat,camera</field>
          id,9885A004,features,"3x zoop, 7.1 megapixel Digital ELPH"
          id,9885A004,features,movie clips up to 640x480 @30 fps
          id,9885A004,features,"2.0"" TFT LCD, 118,000 pixels"
          id,9885A004,features,"built in flash, red-eye reduction"
          id,9885A004,includes,"32MB SD card, USB cable, AV cable, battery"
          id,9885A004,weight,6.4
          id,9885A004,price,329.95
          id,9885A004,popularity,7
          id,9885A004,inStock,true

          Show
          Fuad Efendi added a comment - This is probably SOLR-specific (the best? be focused on task?)... We stick on 4-column format for everything (in case of surrogate PK we may have [id,"001,003,abxc"] ): id,9885A004,name,Canon PowerShot SD500 id,9885A004,manu,Canon Inc. id,9885A004,cat,electronics</field> id,9885A004,cat,camera</field> id,9885A004,features,"3x zoop, 7.1 megapixel Digital ELPH" id,9885A004,features,movie clips up to 640x480 @30 fps id,9885A004,features,"2.0"" TFT LCD, 118,000 pixels" id,9885A004,features,"built in flash, red-eye reduction" id,9885A004,includes,"32MB SD card, USB cable, AV cable, battery" id,9885A004,weight,6.4 id,9885A004,price,329.95 id,9885A004,popularity,7 id,9885A004,inStock,true
          Hide
          Fuad Efendi added a comment -

          even 3-column...
          BTW, good SOLR-targeted single-table database design:
          <PrimaryKey>,<FieldName>,<FieldValue>
          Yes, we can use even index-organized tables in Oracle, without repeated 'parent' values!
          And good standard for CNET customers sending them daily updated product info (is it really search engine?...)
          Thanks

          Show
          Fuad Efendi added a comment - even 3-column... BTW, good SOLR-targeted single-table database design: <PrimaryKey>,<FieldName>,<FieldValue> Yes, we can use even index-organized tables in Oracle, without repeated 'parent' values! And good standard for CNET customers sending them daily updated product info (is it really search engine?...) Thanks
          Hide
          Yonik Seeley added a comment -

          Here's a first cut on a CSV loader.

          You can load the example data file with the following command:
          curl http://localhost:8983/solr/upload/csv' --data 'file=./exampledocs/books.csv'

          This version only implements local file uploading. Perhaps there should be a separate URL for actually posting the CSV file itself?

          Supported parameters:
          file – name of the file to load (needs to be fully qualified, or relative to $CWD)
          charset – default is UTF-8
          separator – default is ,
          fieldnames – can specify or override the names of the columns
          header – "true" if the file contains a header with the fieldnames
          skip – list of fields not to index
          map – maps one value to another.. from:to, either from or to can be empty, multiple rules may be specified.
          keepEmpty – index zero length values
          split – do CSV splitting on a single field value
          encapsulator – char for optionally encapsulating values (needed if reserved char is in val) defaults to "
          commit – automatically commit after loading is finished, default=true

          Per-field overrides for params can be specified via
          f.field.param for the following params: separator, map, keepEmpty,split,encapsulator

          Show
          Yonik Seeley added a comment - Here's a first cut on a CSV loader. You can load the example data file with the following command: curl http://localhost:8983/solr/upload/csv ' --data 'file=./exampledocs/books.csv' This version only implements local file uploading. Perhaps there should be a separate URL for actually posting the CSV file itself? Supported parameters: file – name of the file to load (needs to be fully qualified, or relative to $CWD) charset – default is UTF-8 separator – default is , fieldnames – can specify or override the names of the columns header – "true" if the file contains a header with the fieldnames skip – list of fields not to index map – maps one value to another.. from:to, either from or to can be empty, multiple rules may be specified. keepEmpty – index zero length values split – do CSV splitting on a single field value encapsulator – char for optionally encapsulating values (needed if reserved char is in val) defaults to " commit – automatically commit after loading is finished, default=true Per-field overrides for params can be specified via f.field.param for the following params: separator, map, keepEmpty,split,encapsulator
          Yonik Seeley made changes -
          Field Original Value New Value
          Attachment csv.patch [ 12345585 ]
          Hide
          Yonik Seeley added a comment -

          The other thing you will need to build & try this is commons-csv
          http://people.apache.org/builds/jakarta-commons/nightly/commons-csv/

          Show
          Yonik Seeley added a comment - The other thing you will need to build & try this is commons-csv http://people.apache.org/builds/jakarta-commons/nightly/commons-csv/
          Hide
          Yonik Seeley added a comment -

          attached commons-csv nightly build

          Show
          Yonik Seeley added a comment - attached commons-csv nightly build
          Yonik Seeley made changes -
          Attachment commons-csv-20061121.jar [ 12345636 ]
          Ryan McKinley made changes -
          Link This issue is related to SOLR-104 [ SOLR-104 ]
          Yonik Seeley made changes -
          Summary bulk data loader CSV data loader
          Hide
          Andy Nahapetian added a comment -

          If you change builder.addField in the FieldAdder class to call builder.addField(fields[column].getName(),val,1.0f) instead of builder.addField(fields[column],val,1.0f) then the copyField feature of solr will also work when using the CSVLoader.

          Show
          Andy Nahapetian added a comment - If you change builder.addField in the FieldAdder class to call builder.addField(fields [column] .getName(),val,1.0f) instead of builder.addField(fields [column] ,val,1.0f) then the copyField feature of solr will also work when using the CSVLoader.
          Hide
          Yonik Seeley added a comment -

          Good catch, thank Andy!

          Show
          Yonik Seeley added a comment - Good catch, thank Andy!
          Hide
          Yonik Seeley added a comment -

          Adapted to new update handler framework...
          I think this is about ready to commit.

          Show
          Yonik Seeley added a comment - Adapted to new update handler framework... I think this is about ready to commit.
          Yonik Seeley made changes -
          Attachment csv.patch [ 12354555 ]
          Hide
          Yonik Seeley added a comment -

          committed.

          Show
          Yonik Seeley added a comment - committed.
          Yonik Seeley made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Hide
          Hoss Man added a comment -

          This bug was modified as part of a bulk update using the criteria...

          • Marked ("Resolved" or "Closed") and "Fixed"
          • Had no "Fix Version" versions
          • Was listed in the CHANGES.txt for 1.2

          The Fix Version for all 39 issues found was set to 1.2, email notification
          was suppressed to prevent excessive email.

          For a list of all the issues modified, search jira comments for this
          (hopefully) unique string: 20080415hossman2

          Show
          Hoss Man added a comment - This bug was modified as part of a bulk update using the criteria... Marked ("Resolved" or "Closed") and "Fixed" Had no "Fix Version" versions Was listed in the CHANGES.txt for 1.2 The Fix Version for all 39 issues found was set to 1.2, email notification was suppressed to prevent excessive email. For a list of all the issues modified, search jira comments for this (hopefully) unique string: 20080415hossman2
          Hoss Man made changes -
          Fix Version/s 1.2 [ 12312235 ]
          Uwe Schindler made changes -
          Status Resolved [ 5 ] Closed [ 6 ]

            People

            • Assignee:
              Yonik Seeley
              Reporter:
              Yonik Seeley
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development