Solr
  1. Solr
  2. SOLR-6435

Add script to simplify posting content to Solr

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.10
    • Fix Version/s: 5.0, 6.0
    • Component/s: scripts and tools
    • Labels:
      None

      Description

      Solr's SimplePostTool (example/exampledocs/post.jar) provides a very useful, simple way to get common types of content into Solr. With the new start scripts and the directory refactoring, let's move this tool to a first-class, non "example" script fronted tool.

      1. SOLR-6435.patch
        2 kB
        Erik Hatcher
      2. SOLR-6435.patch
        2 kB
        Erik Hatcher

        Activity

        Hide
        Erik Hatcher added a comment -

        Here's a patch that leverages the fact that SimplePostTool is already in solr-core-*.jar. With this patch you can do "bin/post /some/directory". This is just a quick proof-of-concept; there are TODO's in the script to tackle to flesh this out fully.

        Show
        Erik Hatcher added a comment - Here's a patch that leverages the fact that SimplePostTool is already in solr-core-*.jar. With this patch you can do "bin/post /some/directory". This is just a quick proof-of-concept; there are TODO's in the script to tackle to flesh this out fully.
        Hide
        Erik Hatcher added a comment -

        I'm doing this kind of thing to demonstrate Solr with the post.jar tool:

        java -classpath example/solr-webapp/webapp/WEB-INF/lib/solr-core-*.jar -Ddata=web -Drecursive=1 -Ddelay=1 -Dc=gettingstarted -Dauto org.apache.solr.util.SimplePostTool $@
        

        That's the kind of thing we can get bin/post to do cleanly for some very common use cases (file, web, data files).

        Show
        Erik Hatcher added a comment - I'm doing this kind of thing to demonstrate Solr with the post.jar tool: java -classpath example/solr-webapp/webapp/WEB-INF/lib/solr-core-*.jar -Ddata=web -Drecursive=1 -Ddelay=1 -Dc=gettingstarted -Dauto org.apache.solr.util.SimplePostTool $@ That's the kind of thing we can get bin/post to do cleanly for some very common use cases (file, web, data files).
        Hide
        Timothy Potter added a comment -

        As part of the work I'm doing for SOLR-3619, I'm also invoking the post tool using:

        "$JAVA" -Durl=http://localhost:$SOLR_PORT/solr/$EXAMPLE/update -jar $SOLR_TIP/example/exampledocs/post.jar $SOLR_TIP/example/exampledocs/*.xml

        Of course, this complexity should be hidden behind the simple: bin/solr post command

        Also as part of the work in SOLR-3619, the script will be able to auto-detect the port a local Solr is listening too, so that users don't have to do things like:

        bin/solr post -url http://localhost:8983/solr ...

        Show
        Timothy Potter added a comment - As part of the work I'm doing for SOLR-3619 , I'm also invoking the post tool using: "$JAVA" -Durl= http://localhost:$SOLR_PORT/solr/$EXAMPLE/update -jar $SOLR_TIP/example/exampledocs/post.jar $SOLR_TIP/example/exampledocs/*.xml Of course, this complexity should be hidden behind the simple: bin/solr post command Also as part of the work in SOLR-3619 , the script will be able to auto-detect the port a local Solr is listening too, so that users don't have to do things like: bin/solr post -url http://localhost:8983/solr ...
        Hide
        Erik Hatcher added a comment -

        Current patch works like this:

        # Usage:
        #  bin/post <collection> http://lucidworks.com [depth=1] [delay=1]
        #  bin/post <collection> ~/Documents
        #  bin/post <collection> files*.csv
        #  bin/post <collection> files*.xml
        #  bin/post <collection> files*.json
        

        arbitrary parameters, after the second (file or URL) parameter, are automatically made -D system properties when invoking SimplePostTool.

        Show
        Erik Hatcher added a comment - Current patch works like this: # Usage: # bin/post <collection> http: //lucidworks.com [depth=1] [delay=1] # bin/post <collection> ~/Documents # bin/post <collection> files*.csv # bin/post <collection> files*.xml # bin/post <collection> files*.json arbitrary parameters, after the second (file or URL) parameter, are automatically made -D system properties when invoking SimplePostTool.
        Hide
        Erik Hatcher added a comment - - edited

        With the latest patch, my example above becomes

        bin/post gettingstarted http://lucidworks.com recursive=1 delay=1
        

        And Tim's example becomes

        bin/post $EXAMPLE *.xml port=$SOLR_PORT
        
        Show
        Erik Hatcher added a comment - - edited With the latest patch, my example above becomes bin/post gettingstarted http: //lucidworks.com recursive=1 delay=1 And Tim's example becomes bin/post $EXAMPLE *.xml port=$SOLR_PORT
        Hide
        Shalin Shekhar Mangar added a comment -

        +1

        Looks great!

        Show
        Shalin Shekhar Mangar added a comment - +1 Looks great!
        Hide
        Erik Hatcher added a comment -

        Error checking is the tough part. Latest patch requires this syntax "bin/post <collection> <path or url> [optional params passed to SPT]", but if the user omits the collection name what then? Ugly error currently. I suppose it could check for the existence of the collection and issue a clean error message.

        Any objections or thoughts about committing it basically like this and iterate? I don't plan on making a comparable Windows version of this myself, but patches welcome on that front.

        Show
        Erik Hatcher added a comment - Error checking is the tough part. Latest patch requires this syntax "bin/post <collection> <path or url> [optional params passed to SPT] ", but if the user omits the collection name what then? Ugly error currently. I suppose it could check for the existence of the collection and issue a clean error message. Any objections or thoughts about committing it basically like this and iterate? I don't plan on making a comparable Windows version of this myself, but patches welcome on that front.
        Hide
        Anshum Gupta added a comment -

        I tried indexing *.xml and it just ended up indexing 1 file as there seems to be a problem with the expansion:

        solr $ bin/post gettingstarted example/exampledocs/*.xml
        Collection: gettingstarted
        PATH
        java -classpath example/solr-webapp/webapp/WEB-INF/lib/solr-core-6.0.0-SNAPSHOT.jar -Dc=gettingstarted -Dexample/exampledocs/hd.xml -Dexample/exampledocs/ipod_other.xml -Dexample/exampledocs/ipod_video.xml -Dexample/exampledocs/manufacturers.xml -Dexample/exampledocs/mem.xml -Dexample/exampledocs/money.xml -Dexample/exampledocs/monitor.xml -Dexample/exampledocs/monitor2.xml -Dexample/exampledocs/mp500.xml -Dexample/exampledocs/sd500.xml -Dexample/exampledocs/solr.xml -Dexample/exampledocs/utf8-example.xml -Dexample/exampledocs/vidcard.xml org.apache.solr.util.SimplePostTool example/exampledocs/gb18030-example.xml
        SimplePostTool version 1.5
        Posting files to base url http://localhost:8983/solr/gettingstarted/update using content-type application/xml..
        POSTing file gb18030-example.xml
        1 files indexed.
        COMMITting Solr index changes to http://localhost:8983/solr/gettingstarted/update..
        Time spent: 0:00:00.074

        Show
        Anshum Gupta added a comment - I tried indexing *.xml and it just ended up indexing 1 file as there seems to be a problem with the expansion: solr $ bin/post gettingstarted example/exampledocs/*.xml Collection: gettingstarted PATH java -classpath example/solr-webapp/webapp/WEB-INF/lib/solr-core-6.0.0-SNAPSHOT.jar -Dc=gettingstarted -Dexample/exampledocs/hd.xml -Dexample/exampledocs/ipod_other.xml -Dexample/exampledocs/ipod_video.xml -Dexample/exampledocs/manufacturers.xml -Dexample/exampledocs/mem.xml -Dexample/exampledocs/money.xml -Dexample/exampledocs/monitor.xml -Dexample/exampledocs/monitor2.xml -Dexample/exampledocs/mp500.xml -Dexample/exampledocs/sd500.xml -Dexample/exampledocs/solr.xml -Dexample/exampledocs/utf8-example.xml -Dexample/exampledocs/vidcard.xml org.apache.solr.util.SimplePostTool example/exampledocs/gb18030-example.xml SimplePostTool version 1.5 Posting files to base url http://localhost:8983/solr/gettingstarted/update using content-type application/xml.. POSTing file gb18030-example.xml 1 files indexed. COMMITting Solr index changes to http://localhost:8983/solr/gettingstarted/update .. Time spent: 0:00:00.074
        Hide
        ASF subversion and git services added a comment -

        Commit 1647928 from Erik Hatcher in branch 'dev/trunk'
        [ https://svn.apache.org/r1647928 ]

        SOLR-6435: Add script to simplify posting content to Solr

        Show
        ASF subversion and git services added a comment - Commit 1647928 from Erik Hatcher in branch 'dev/trunk' [ https://svn.apache.org/r1647928 ] SOLR-6435 : Add script to simplify posting content to Solr
        Hide
        Erik Hatcher added a comment -

        Put a simple stake in the ground on trunk with bin/post.

        TODO's: create comparable bin/post.cmd for Windows; centralize common environment (like Java and variables) across bin/solr and bin/post; merge this to branch_5x

        Show
        Erik Hatcher added a comment - Put a simple stake in the ground on trunk with bin/post. TODO's: create comparable bin/post.cmd for Windows; centralize common environment (like Java and variables) across bin/solr and bin/post; merge this to branch_5x
        Hide
        ASF subversion and git services added a comment -

        Commit 1648478 from Erik Hatcher in branch 'dev/trunk'
        [ https://svn.apache.org/r1648478 ]

        SOLR-6435: bin/post cleanup for 5x merge

        Show
        ASF subversion and git services added a comment - Commit 1648478 from Erik Hatcher in branch 'dev/trunk' [ https://svn.apache.org/r1648478 ] SOLR-6435 : bin/post cleanup for 5x merge
        Hide
        ASF subversion and git services added a comment -

        Commit 1648479 from Erik Hatcher in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1648479 ]

        SOLR-6435: Added bin/post

        Show
        ASF subversion and git services added a comment - Commit 1648479 from Erik Hatcher in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1648479 ] SOLR-6435 : Added bin/post
        Hide
        Erik Hatcher added a comment -

        Pleasantly functional version committed to both 5x and trunk. 'nix only at this point (no .cmd version). A brief bit of code is copied from bin/solr with TODO to centralize what makes sense across scripts.

        Show
        Erik Hatcher added a comment - Pleasantly functional version committed to both 5x and trunk. 'nix only at this point (no .cmd version). A brief bit of code is copied from bin/solr with TODO to centralize what makes sense across scripts.
        Hide
        Alexandre Rafalovitch added a comment -

        Is 'tehfiles' intentional or misspelling?

        bin/post tehfiles ~/Documents

        Also. Are there any plans for something - anything - that issues a delete command?

        Show
        Alexandre Rafalovitch added a comment - Is 'tehfiles' intentional or misspelling? bin/post tehfiles ~/Documents Also. Are there any plans for something - anything - that issues a delete command?
        Hide
        Erik Hatcher added a comment -

        Is 'tehfiles' intentional or misspelling?

        yeah, lol - was just a comment in the script, not official usage. We'll get official -help style usage output in the script as well (feel free to open a new JIRA if you're feeling it).

        Are there any plans for something - anything - that issues a delete command?

        What do you have in mind? One could still do that with the SimplePostTool, with

        java -Ddata=args -Dc=tehfiles -classpath dist/solr-core-*.jar org.apache.solr.util.SimplePostTool dist/solr-core-*.jar '<delete><id>42</id></delete>'

        however bin/post does not (currently) support -Ddata=args. Certainly we wouldn't want a user to type in that XML incantation though, so I imagine there could be a bin/delete script that allowed for clean deleting by id or by query. I'm curious what scenarios and interface folks imagine for a friendlier delete facility.

        Show
        Erik Hatcher added a comment - Is 'tehfiles' intentional or misspelling? yeah, lol - was just a comment in the script, not official usage. We'll get official -help style usage output in the script as well (feel free to open a new JIRA if you're feeling it). Are there any plans for something - anything - that issues a delete command? What do you have in mind? One could still do that with the SimplePostTool, with java -Ddata=args -Dc=tehfiles -classpath dist/solr-core-*.jar org.apache.solr.util.SimplePostTool dist/solr-core-*.jar '<delete><id>42</id></delete>' however bin/post does not (currently) support -Ddata=args. Certainly we wouldn't want a user to type in that XML incantation though, so I imagine there could be a bin/delete script that allowed for clean deleting by id or by query. I'm curious what scenarios and interface folks imagine for a friendlier delete facility.
        Hide
        Anshum Gupta added a comment -

        Bulk close after 5.0 release.

        Show
        Anshum Gupta added a comment - Bulk close after 5.0 release.

          People

          • Assignee:
            Erik Hatcher
            Reporter:
            Erik Hatcher
          • Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development