Nutch
  1. Nutch
  2. NUTCH-987

Support HTTP auth for Solr communication

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.4
    • Component/s: indexer
    • Labels:
      None
    • Patch Info:
      Patch Available

      Description

      At the moment we cannot send data directly to a public HTTP auth protected Solr instance. I've a WIP that passes a configured HTTPClient object to CommonsHttpSolrServer, it works. This issue should add this ability to indexing, dedup and clean and be configured from some configuration file.

      Enable Solr HTTP auth communication by setting the following parameters in your nutch-site config:

      • solr.auth=true
      • solr.auth.username=USERNAME
      • solr.auth.password=PASSWORD
      1. NUTCH-987-1.3-hack.patch
        4 kB
        Markus Jelsma
      2. NUTCH-987-1.4.1-2.patch
        7 kB
        Markus Jelsma
      3. NUTCH-987-1.4-3.patch
        8 kB
        Markus Jelsma
      4. NUTCH-987-2.0-1.patch
        12 kB
        Markus Jelsma
      5. NUTCH-987-2.0-2.patch
        9 kB
        Lewis John McGibbney
      6. NUTCH-987-2.0-2.patch
        9 kB
        Lewis John McGibbney
      7. SolrUtils.java
        3 kB
        Markus Jelsma

        Issue Links

          Activity

          Hide
          Markus Jelsma added a comment -

          Attached nasty hack for the sake of not losing it.

          Show
          Markus Jelsma added a comment - Attached nasty hack for the sake of not losing it.
          Hide
          Markus Jelsma added a comment - - edited

          Patch for 1.4. Also moved UTF-8 strip method to Solr utils. It's implemented using simple job properties, no fancy AuthScope stuff. Please comment.

          Show
          Markus Jelsma added a comment - - edited Patch for 1.4. Also moved UTF-8 strip method to Solr utils. It's implemented using simple job properties, no fancy AuthScope stuff. Please comment.
          Hide
          Markus Jelsma added a comment -

          Some instances in SolrDedup were missing. Also cleaned up some mess.

          Show
          Markus Jelsma added a comment - Some instances in SolrDedup were missing. Also cleaned up some mess.
          Hide
          Markus Jelsma added a comment -

          Are there objections? Pointers? Comments?

          Show
          Markus Jelsma added a comment - Are there objections? Pointers? Comments?
          Hide
          Lewis John McGibbney added a comment -

          Based upon the current patch you provided, I think this is a good suggestion for inclusion. I am not currently using an auth protected Solr core in production, but will get authentication set up in development and get this tested Markus. It would make sense for inclusion just now as it will inevitably become a requested feature in the future.

          Further to this, to address you initial question, I agree with the comments regarding the location to configure the auth credentials for Solr communication as quite simply I cannot think of any other solution which would do anything other than clutter.

          Show
          Lewis John McGibbney added a comment - Based upon the current patch you provided, I think this is a good suggestion for inclusion. I am not currently using an auth protected Solr core in production, but will get authentication set up in development and get this tested Markus. It would make sense for inclusion just now as it will inevitably become a requested feature in the future. Further to this, to address you initial question, I agree with the comments regarding the location to configure the auth credentials for Solr communication as quite simply I cannot think of any other solution which would do anything other than clutter.
          Hide
          Markus Jelsma added a comment -

          If no objections i'll send this one in together with NUTCH-1036. This patch includes the changes made for NUTCH-1036, adding reporter increments here and there.

          Show
          Markus Jelsma added a comment - If no objections i'll send this one in together with NUTCH-1036 . This patch includes the changes made for NUTCH-1036 , adding reporter increments here and there.
          Hide
          Julien Nioche added a comment -

          don't forget to add the parameters you introduced to nutch-default.xml (with authentication off by default)
          +1 otherwise

          Thanks!

          Show
          Julien Nioche added a comment - don't forget to add the parameters you introduced to nutch-default.xml (with authentication off by default) +1 otherwise Thanks!
          Hide
          Markus Jelsma added a comment -

          The previous patch has the config change for nutch-default, i missed it in the last patch. Thanks Lewis and Julien!

          Show
          Markus Jelsma added a comment - The previous patch has the config change for nutch-default, i missed it in the last patch. Thanks Lewis and Julien!
          Hide
          Markus Jelsma added a comment -

          Committed for 1.4 in rev. 1146035.

          Show
          Markus Jelsma added a comment - Committed for 1.4 in rev. 1146035.
          Hide
          Julien Nioche added a comment -

          Hi Markus, will this be committed to trunk as well?

          Show
          Julien Nioche added a comment - Hi Markus, will this be committed to trunk as well?
          Hide
          Markus Jelsma added a comment -

          Yes, at least partially. Solrclean isn't finished yet and dedup is broken.

          Show
          Markus Jelsma added a comment - Yes, at least partially. Solrclean isn't finished yet and dedup is broken.
          Hide
          Markus Jelsma added a comment -

          Partial patch for 2.0 includes support for HTTP auth for solrindex and solrdedup and includes NUTCH-1026 and NUTCH-1036.

          Anyone who can test this would make me happy.

          Show
          Markus Jelsma added a comment - Partial patch for 2.0 includes support for HTTP auth for solrindex and solrdedup and includes NUTCH-1026 and NUTCH-1036 . Anyone who can test this would make me happy.
          Hide
          Lewis John McGibbney added a comment -

          Hi Markus, the patch for 2.0 does not apply cleanly for me I get the following output

          lewis@lewis-01:~/ASF/trunk$ patch -p0 -i NUTCH-987-2.0-1.patch
          patching file src/java/org/apache/nutch/indexer/solr/SolrUtils.java
          patching file src/java/org/apache/nutch/indexer/solr/SolrDeleteDuplicates.java
          patching file src/java/org/apache/nutch/indexer/solr/SolrIndexerJob.java
          patching file src/java/org/apache/nutch/indexer/solr/SolrConstants.java
          patching file src/java/org/apache/nutch/indexer/solr/SolrWriter.java
          patching file src/java/org/apache/nutch/indexer/IndexerReducer.java
          patching file conf/nutch-default.xml
          Hunk #1 FAILED at 728.
          Hunk #2 succeeded at 1060 (offset 13 lines).
          1 out of 2 hunks FAILED -- saving rejects to file conf/nutch-default.xml.rej
          

          I therefore I attach an updated patch which applies cleanly, however it breaks runtime builds and (already broken) tests with the following output

          [javac] Compiling 5 source files to /home/lewis/ASF/trunk/build/classes
              [javac] /home/lewis/ASF/trunk/src/java/org/apache/nutch/indexer/solr/SolrDeleteDuplicates.java:231: cannot find symbol
              [javac] symbol  : variable SolrUtils
              [javac] location: class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrInputFormat
              [javac]       SolrServer solr = SolrUtils.getCommonsHttpSolrServer(conf);
              [javac]                         ^
              [javac] /home/lewis/ASF/trunk/src/java/org/apache/nutch/indexer/solr/SolrDeleteDuplicates.java:261: cannot find symbol
              [javac] symbol  : variable SolrUtils
              [javac] location: class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrInputFormat
              [javac]       SolrServer solr = SolrUtils.getCommonsHttpSolrServer(conf);
              [javac]                         ^
              [javac] /home/lewis/ASF/trunk/src/java/org/apache/nutch/indexer/solr/SolrDeleteDuplicates.java:306: cannot find symbol
              [javac] symbol  : variable SolrUtils
              [javac] location: class org.apache.nutch.indexer.solr.SolrDeleteDuplicates
              [javac]        solr = SolrUtils.getCommonsHttpSolrServer(conf);
              [javac]               ^
              [javac] /home/lewis/ASF/trunk/src/java/org/apache/nutch/indexer/solr/SolrIndexerJob.java:70: cannot find symbol
              [javac] symbol  : variable SolrUtils
              [javac] location: class org.apache.nutch.indexer.solr.SolrIndexerJob
              [javac]       SolrServer solr = SolrUtils.getCommonsHttpSolrServer(getConf());
              [javac]                         ^
              [javac] /home/lewis/ASF/trunk/src/java/org/apache/nutch/indexer/solr/SolrWriter.java:51: cannot find symbol
              [javac] symbol  : variable SolrUtils
              [javac] location: class org.apache.nutch.indexer.solr.SolrWriter
              [javac]     solr = SolrUtils.getCommonsHttpSolrServer(conf);
              [javac]            ^
              [javac] /home/lewis/ASF/trunk/src/java/org/apache/nutch/indexer/solr/SolrWriter.java:64: cannot find symbol
              [javac] symbol  : variable SolrUtils
              [javac] location: class org.apache.nutch.indexer.solr.SolrWriter
              [javac]           val2 = SolrUtils.stripNonCharCodepoints((String)val);
              [javac]                  ^
              [javac] 6 errors
          
          BUILD FAILED
          /home/lewis/ASF/trunk/build.xml:96: Compile failed; see the compiler error output for details.
          
          Show
          Lewis John McGibbney added a comment - Hi Markus, the patch for 2.0 does not apply cleanly for me I get the following output lewis@lewis-01:~/ASF/trunk$ patch -p0 -i NUTCH-987-2.0-1.patch patching file src/java/org/apache/nutch/indexer/solr/SolrUtils.java patching file src/java/org/apache/nutch/indexer/solr/SolrDeleteDuplicates.java patching file src/java/org/apache/nutch/indexer/solr/SolrIndexerJob.java patching file src/java/org/apache/nutch/indexer/solr/SolrConstants.java patching file src/java/org/apache/nutch/indexer/solr/SolrWriter.java patching file src/java/org/apache/nutch/indexer/IndexerReducer.java patching file conf/nutch- default .xml Hunk #1 FAILED at 728. Hunk #2 succeeded at 1060 (offset 13 lines). 1 out of 2 hunks FAILED -- saving rejects to file conf/nutch- default .xml.rej I therefore I attach an updated patch which applies cleanly, however it breaks runtime builds and (already broken) tests with the following output [javac] Compiling 5 source files to /home/lewis/ASF/trunk/build/classes [javac] /home/lewis/ASF/trunk/src/java/org/apache/nutch/indexer/solr/SolrDeleteDuplicates.java:231: cannot find symbol [javac] symbol : variable SolrUtils [javac] location: class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrInputFormat [javac] SolrServer solr = SolrUtils.getCommonsHttpSolrServer(conf); [javac] ^ [javac] /home/lewis/ASF/trunk/src/java/org/apache/nutch/indexer/solr/SolrDeleteDuplicates.java:261: cannot find symbol [javac] symbol : variable SolrUtils [javac] location: class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrInputFormat [javac] SolrServer solr = SolrUtils.getCommonsHttpSolrServer(conf); [javac] ^ [javac] /home/lewis/ASF/trunk/src/java/org/apache/nutch/indexer/solr/SolrDeleteDuplicates.java:306: cannot find symbol [javac] symbol : variable SolrUtils [javac] location: class org.apache.nutch.indexer.solr.SolrDeleteDuplicates [javac] solr = SolrUtils.getCommonsHttpSolrServer(conf); [javac] ^ [javac] /home/lewis/ASF/trunk/src/java/org/apache/nutch/indexer/solr/SolrIndexerJob.java:70: cannot find symbol [javac] symbol : variable SolrUtils [javac] location: class org.apache.nutch.indexer.solr.SolrIndexerJob [javac] SolrServer solr = SolrUtils.getCommonsHttpSolrServer(getConf()); [javac] ^ [javac] /home/lewis/ASF/trunk/src/java/org/apache/nutch/indexer/solr/SolrWriter.java:51: cannot find symbol [javac] symbol : variable SolrUtils [javac] location: class org.apache.nutch.indexer.solr.SolrWriter [javac] solr = SolrUtils.getCommonsHttpSolrServer(conf); [javac] ^ [javac] /home/lewis/ASF/trunk/src/java/org/apache/nutch/indexer/solr/SolrWriter.java:64: cannot find symbol [javac] symbol : variable SolrUtils [javac] location: class org.apache.nutch.indexer.solr.SolrWriter [javac] val2 = SolrUtils.stripNonCharCodepoints(( String )val); [javac] ^ [javac] 6 errors BUILD FAILED /home/lewis/ASF/trunk/build.xml:96: Compile failed; see the compiler error output for details.
          Hide
          Markus Jelsma added a comment -

          Ah the config. You can easily add the config params yourself but they're not strictly required as the code already uses the same defaults.

          I'm off!! Cheers!

          Show
          Markus Jelsma added a comment - Ah the config. You can easily add the config params yourself but they're not strictly required as the code already uses the same defaults. I'm off!! Cheers!
          Hide
          Markus Jelsma added a comment - - edited

          Resolved for 1.4, see NUTCH-1104 for 2.0

          Show
          Markus Jelsma added a comment - - edited Resolved for 1.4, see NUTCH-1104 for 2.0

            People

            • Assignee:
              Markus Jelsma
              Reporter:
              Markus Jelsma
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development