Solr
  1. Solr
  2. SOLR-3672

SimplePostTool: Improvements for posting files

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0-BETA, 6.0
    • Component/s: scripts and tools
    • Labels:
      None

      Description

      Various improvements to make SimplePostTool more useful

      1. SOLR-3672.patch
        22 kB
        Jan Høydahl
      2. SOLR-3672.patch
        20 kB
        Jan Høydahl
      3. SOLR-3672.patch
        18 kB
        Jan Høydahl
      4. SOLR-3672.patch
        18 kB
        Jan Høydahl

        Issue Links

          Activity

          Hide
          Jan Høydahl added a comment -

          Here's the new help screen for the patch I'm about to attach

          SimplePostTool: version 1.5
          Usage: java [SystemProperties] -jar post.jar [<file|folder> [<file|folder>...]]
          
          Supported System Properties and their defaults:
            -Ddata=yes|no (default=files)
            -Dtype=<content-type> (default=application/xml)
            -Durl=<solr-update-url> (default=http://localhost:8983/solr/update)
            -Dauto=yes|no (default=no)
            -Drecursive=yes|no (default=no)
            -Dfiletypes=<type>[,<type>,...] (default=xml,json,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,rtf,htm,html)
            -Dparams="<key>=<value>[&<key>=<value>...]" (values must be URL-encoded)
            -Dcommit=yes|no (default=yes)
            -Doptimize=yes|no (default=no)
            -Dout=yes|no (default=no)
          
          This is a simple command line tool for POSTing raw data to a Solr
          port.  Data can be read from files specified as commandline args,
          as raw commandline arg strings, or via STDIN.
          Examples:
            java -jar post.jar *.xml
            java -Ddata=args  -jar post.jar '<delete><id>42</id></delete>'
            java -Ddata=stdin -jar post.jar < hd.xml
            java -Dtype=text/csv -jar post.jar *.csv
            java -Dtype=application/json -jar post.jar *.json
            java -Durl=http://localhost:8983/solr/update/extract -Dparams=literal.id=a -Dtype=application/pdf -jar post.jar a.pdf
            java -Dauto=yes -jar post.jar a.pdf
            java -Dauto=yes -Drecursive=yes -jar post.jar afolder
            java -Dauto=yes -Dfiletypes=ppt,html -jar post.jar afolder
          The options controlled by System Properties include the Solr
          URL to POST to, the Content-Type of the data, whether a commit
          or optimize should be executed, and whether the response should
          be written to STDOUT. If auto=yes the tool will try to guess type
          type and set type and url automatically. When posting rich documents
          the file name will be propagated as "resource.name" and also used as "literal.id".
          You may override these or any other request parameter through the -Dparams property
          

          -Dauto=yes : Will guess file type from file name suffix, and set type and url accordingly. It also sets the ID and file name automatically
          -Drecursive=yes : Will recurse into sub-folders and index all files
          -Dfiletypes : Specifies the file types to consider when indexing folders
          -Dparams : HTTP GET params to add to the request, so you don't need to write the whole URL again

          Show
          Jan Høydahl added a comment - Here's the new help screen for the patch I'm about to attach SimplePostTool: version 1.5 Usage: java [SystemProperties] -jar post.jar [<file|folder> [<file|folder>...]] Supported System Properties and their defaults: -Ddata=yes|no (default=files) -Dtype=<content-type> (default=application/xml) -Durl=<solr-update-url> (default=http://localhost:8983/solr/update) -Dauto=yes|no (default=no) -Drecursive=yes|no (default=no) -Dfiletypes=<type>[,<type>,...] (default=xml,json,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,rtf,htm,html) -Dparams="<key>=<value>[&<key>=<value>...]" (values must be URL-encoded) -Dcommit=yes|no (default=yes) -Doptimize=yes|no (default=no) -Dout=yes|no (default=no) This is a simple command line tool for POSTing raw data to a Solr port. Data can be read from files specified as commandline args, as raw commandline arg strings, or via STDIN. Examples: java -jar post.jar *.xml java -Ddata=args -jar post.jar '<delete><id>42</id></delete>' java -Ddata=stdin -jar post.jar < hd.xml java -Dtype=text/csv -jar post.jar *.csv java -Dtype=application/json -jar post.jar *.json java -Durl=http://localhost:8983/solr/update/extract -Dparams=literal.id=a -Dtype=application/pdf -jar post.jar a.pdf java -Dauto=yes -jar post.jar a.pdf java -Dauto=yes -Drecursive=yes -jar post.jar afolder java -Dauto=yes -Dfiletypes=ppt,html -jar post.jar afolder The options controlled by System Properties include the Solr URL to POST to, the Content-Type of the data, whether a commit or optimize should be executed, and whether the response should be written to STDOUT. If auto=yes the tool will try to guess type type and set type and url automatically. When posting rich documents the file name will be propagated as "resource.name" and also used as "literal.id". You may override these or any other request parameter through the -Dparams property -Dauto=yes : Will guess file type from file name suffix, and set type and url accordingly. It also sets the ID and file name automatically -Drecursive=yes : Will recurse into sub-folders and index all files -Dfiletypes : Specifies the file types to consider when indexing folders -Dparams : HTTP GET params to add to the request, so you don't need to write the whole URL again
          Hide
          Jan Høydahl added a comment -

          Any feedback on this? There are no automated tests but I have tested running a full recursive post of my "My Documents" folder and subfolders, and it passed the test except for expected warnings for some .csv files which were not meant for Solr

          java -Dauto=yes -Drecursive=yes -jar post.jar $HOME
          
          Show
          Jan Høydahl added a comment - Any feedback on this? There are no automated tests but I have tested running a full recursive post of my "My Documents" folder and subfolders, and it passed the test except for expected warnings for some .csv files which were not meant for Solr java -Dauto=yes -Drecursive=yes -jar post.jar $HOME
          Hide
          Jan Høydahl added a comment -

          This updated patch skips hidden folders and files

          Show
          Jan Høydahl added a comment - This updated patch skips hidden folders and files
          Hide
          Jan Høydahl added a comment -

          New patch:

          • Allows "." as current dir (although "hidden")
          • Allows "true/on/1/yes" for options, not just "yes"
          • Fixed problem with default type in non-auto mode
          • Fixed typo in help
          • Removed deprecated methods
          • Fixed some javadocs

          Will commit this in a day or two if no comments

          Show
          Jan Høydahl added a comment - New patch: Allows "." as current dir (although "hidden") Allows "true/on/1/yes" for options, not just "yes" Fixed problem with default type in non-auto mode Fixed typo in help Removed deprecated methods Fixed some javadocs Will commit this in a day or two if no comments
          Hide
          Jan Høydahl added a comment -

          New update:

          • Exits with short usage msg if no arguments, instead of attempting a COMMIT
          • To do commit-only, supply a single argument "-"
          • In auto mode, also prints detected content-type in the output
          • Cleaner printout without "SimplePostTool:" prefix from info() method
          Show
          Jan Høydahl added a comment - New update: Exits with short usage msg if no arguments, instead of attempting a COMMIT To do commit-only, supply a single argument "-" In auto mode, also prints detected content-type in the output Cleaner printout without "SimplePostTool:" prefix from info() method
          Hide
          Jan Høydahl added a comment -

          Committed r1367371 to trunk and r1367373 to branch_4x

          Show
          Jan Høydahl added a comment - Committed r1367371 to trunk and r1367373 to branch_4x
          Hide
          Jack Krupansky added a comment -

          Thanks for the great improvements, Jan.

          Only complaint: The CHANGES.TXT should note that rich documents can be posted and their file name automatically passed for indexing.

          Show
          Jack Krupansky added a comment - Thanks for the great improvements, Jan. Only complaint: The CHANGES.TXT should note that rich documents can be posted and their file name automatically passed for indexing.

            People

            • Assignee:
              Jan Høydahl
              Reporter:
              Jan Høydahl
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development