Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-7268

Add a script to pipe data from other programs or files to Solr using SolrJ

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Workaround
    • None
    • None
    • scripts and tools, SolrJ
    • None

    Description

      I should be able to pipe JSON/XML/CSV or whatever is possible at the /update/* to a command which in turn uses SolrJ to send the docs to the correct leader in native format.
      In the following examples , all connection details of the cluster is put into a file called solrj.properties
      example :

      #post a file
      cat myjson.json | bin/post -c gettingstarted -s http://localhost:8983/solr 
      #or a producer program
      myprogram | bin/post  -c gettingstarted -s http://localhost:8983/solr
      

      The behavior of the script would be exactly similar to the behavior if I were to post the request directly to solr to the specified qt . Everything parameter the requesthandler accepts would be accepted as a -<param-name>=<param-value> format. The same things could be put into a properties file called indexer.properties and be passed as a -p parameter. The script would expect the following extra properties zk.url for cloud or solr.url for standalone.

      Attachments

        Activity

          janhoy Jan Høydahl added a comment -

          post.jar already reads from stdin if you pass -Ddata=stdin. Does not use SolrJ though, but perhaps it is time for bin/post to start using SolrJ?

          The open-ended -<param-name>=<param-value> is scary if a request handler's param overlaps with script args.

          janhoy Jan Høydahl added a comment - post.jar already reads from stdin if you pass -Ddata=stdin . Does not use SolrJ though, but perhaps it is time for bin/post to start using SolrJ? The open-ended -<param-name>=<param-value> is scary if a request handler's param overlaps with script args.
          noble.paul Noble Paul added a comment -

          you are right the -<param-name>=<param-value> can lead to conflicts we can just have a generic param like -params key1=val1&key2=val2 etc. Anyway , nobody has yet picked up the implementation

          noble.paul Noble Paul added a comment - you are right the -<param-name>=<param-value> can lead to conflicts we can just have a generic param like -params key1=val1&key2=val2 etc. Anyway , nobody has yet picked up the implementation
          epugh Eric Pugh added a comment -

          You may be interested in the work in SOLR-14673 to support bin/solr stream.   The same example above would be:

          cat mycsv.csv | bin/solr stream -e local 'update(gettingstarted,parseCSV(stdin()))'

           

          That could be a myjson.json file and a parseJSON(stdin()) call as well.    You can also do other manipulations as needed right in the expression....

          epugh Eric Pugh added a comment - You may be interested in the work in SOLR-14673 to support bin/solr stream.   The same example above would be: cat mycsv.csv | bin/solr stream -e local 'update(gettingstarted,parseCSV(stdin()))'   That could be a myjson.json file and a parseJSON(stdin()) call as well.    You can also do other manipulations as needed right in the expression....
          epugh Eric Pugh added a comment -

          I am going to resolve this in favour of the solr streaming effort.  However if we want to move forward with this, please do re-open it!

          epugh Eric Pugh added a comment - I am going to resolve this in favour of the solr streaming effort.  However if we want to move forward with this, please do re-open it!
          epugh Eric Pugh added a comment -

          The upcoming bin/solr stream command will handle this.

          epugh Eric Pugh added a comment - The upcoming bin/solr stream command will handle this.

          People

            noble.paul Noble Paul
            noble.paul Noble Paul
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: