Nutch
  1. Nutch
  2. NUTCH-987

Support HTTP auth for Solr communication

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.4
    • Component/s: indexer
    • Labels:
      None
    • Patch Info:
      Patch Available

      Description

      At the moment we cannot send data directly to a public HTTP auth protected Solr instance. I've a WIP that passes a configured HTTPClient object to CommonsHttpSolrServer, it works. This issue should add this ability to indexing, dedup and clean and be configured from some configuration file.

      Enable Solr HTTP auth communication by setting the following parameters in your nutch-site config:

      • solr.auth=true
      • solr.auth.username=USERNAME
      • solr.auth.password=PASSWORD
      1. NUTCH-987-2.0-2.patch
        9 kB
        Lewis John McGibbney
      2. NUTCH-987-2.0-2.patch
        9 kB
        Lewis John McGibbney
      3. NUTCH-987-2.0-1.patch
        12 kB
        Markus Jelsma
      4. NUTCH-987-1.4-3.patch
        8 kB
        Markus Jelsma
      5. NUTCH-987-1.4.1-2.patch
        7 kB
        Markus Jelsma
      6. SolrUtils.java
        3 kB
        Markus Jelsma
      7. NUTCH-987-1.3-hack.patch
        4 kB
        Markus Jelsma

        Issue Links

          Activity

          Markus Jelsma created issue -
          Hide
          Markus Jelsma added a comment -

          Attached nasty hack for the sake of not losing it.

          Show
          Markus Jelsma added a comment - Attached nasty hack for the sake of not losing it.
          Markus Jelsma made changes -
          Field Original Value New Value
          Attachment NUTCH-987-1.3-hack.patch [ 12477496 ]
          Markus Jelsma made changes -
          Fix Version/s 2.0 [ 12314893 ]
          Markus Jelsma made changes -
          Fix Version/s 1.4 [ 12316519 ]
          Markus Jelsma made changes -
          Patch Info [Patch Available]
          Hide
          Markus Jelsma added a comment - - edited

          Patch for 1.4. Also moved UTF-8 strip method to Solr utils. It's implemented using simple job properties, no fancy AuthScope stuff. Please comment.

          Show
          Markus Jelsma added a comment - - edited Patch for 1.4. Also moved UTF-8 strip method to Solr utils. It's implemented using simple job properties, no fancy AuthScope stuff. Please comment.
          Markus Jelsma made changes -
          Attachment NUTCH-987-1.4.1-1.patch [ 12486055 ]
          Attachment SolrUtils.java [ 12486056 ]
          Hide
          Markus Jelsma added a comment -

          Some instances in SolrDedup were missing. Also cleaned up some mess.

          Show
          Markus Jelsma added a comment - Some instances in SolrDedup were missing. Also cleaned up some mess.
          Markus Jelsma made changes -
          Attachment NUTCH-987-1.4.1-2.patch [ 12486061 ]
          Markus Jelsma made changes -
          Attachment NUTCH-987-1.4.1-1.patch [ 12486055 ]
          Hide
          Markus Jelsma added a comment -

          Are there objections? Pointers? Comments?

          Show
          Markus Jelsma added a comment - Are there objections? Pointers? Comments?
          Hide
          Lewis John McGibbney added a comment -

          Based upon the current patch you provided, I think this is a good suggestion for inclusion. I am not currently using an auth protected Solr core in production, but will get authentication set up in development and get this tested Markus. It would make sense for inclusion just now as it will inevitably become a requested feature in the future.

          Further to this, to address you initial question, I agree with the comments regarding the location to configure the auth credentials for Solr communication as quite simply I cannot think of any other solution which would do anything other than clutter.

          Show
          Lewis John McGibbney added a comment - Based upon the current patch you provided, I think this is a good suggestion for inclusion. I am not currently using an auth protected Solr core in production, but will get authentication set up in development and get this tested Markus. It would make sense for inclusion just now as it will inevitably become a requested feature in the future. Further to this, to address you initial question, I agree with the comments regarding the location to configure the auth credentials for Solr communication as quite simply I cannot think of any other solution which would do anything other than clutter.
          Hide
          Markus Jelsma added a comment -

          If no objections i'll send this one in together with NUTCH-1036. This patch includes the changes made for NUTCH-1036, adding reporter increments here and there.

          Show
          Markus Jelsma added a comment - If no objections i'll send this one in together with NUTCH-1036 . This patch includes the changes made for NUTCH-1036 , adding reporter increments here and there.
          Markus Jelsma made changes -
          Attachment NUTCH-987-1.4-3.patch [ 12486303 ]
          Hide
          Julien Nioche added a comment -

          don't forget to add the parameters you introduced to nutch-default.xml (with authentication off by default)
          +1 otherwise

          Thanks!

          Show
          Julien Nioche added a comment - don't forget to add the parameters you introduced to nutch-default.xml (with authentication off by default) +1 otherwise Thanks!
          Hide
          Markus Jelsma added a comment -

          The previous patch has the config change for nutch-default, i missed it in the last patch. Thanks Lewis and Julien!

          Show
          Markus Jelsma added a comment - The previous patch has the config change for nutch-default, i missed it in the last patch. Thanks Lewis and Julien!
          Markus Jelsma made changes -
          Link This issue incorporates NUTCH-1036 [ NUTCH-1036 ]
          Hide
          Markus Jelsma added a comment -

          Committed for 1.4 in rev. 1146035.

          Show
          Markus Jelsma added a comment - Committed for 1.4 in rev. 1146035.
          Markus Jelsma made changes -
          Description At the moment we cannot send data directly to a public HTTP auth protected Solr instance. I've a WIP that passes a configured HTTPClient object to CommonsHttpSolrServer, it works. This issue should add this ability to indexing, dedup and clean and be configured from some configuration file.

          The question is, is the current httpclient-auth.xml the correct place? It does provide a nice means to configure the AuthScope objects but it is used for fetching. But, since AuthScope is used we could easily add the credentials for Solr there as well and add a new nutch-default option for toggling HTTP auth.

          Thoughts?
          At the moment we cannot send data directly to a public HTTP auth protected Solr instance. I've a WIP that passes a configured HTTPClient object to CommonsHttpSolrServer, it works. This issue should add this ability to indexing, dedup and clean and be configured from some configuration file.

          Enable Solr HTTP auth communication by setting the following parameters in your nutch-site config:
          * solr.auth=true
          * solr.auth.username=USERNAME
          * solr.auth.password=PASSWORD
          Hide
          Julien Nioche added a comment -

          Hi Markus, will this be committed to trunk as well?

          Show
          Julien Nioche added a comment - Hi Markus, will this be committed to trunk as well?
          Hide
          Markus Jelsma added a comment -

          Yes, at least partially. Solrclean isn't finished yet and dedup is broken.

          Show
          Markus Jelsma added a comment - Yes, at least partially. Solrclean isn't finished yet and dedup is broken.
          Hide
          Markus Jelsma added a comment -

          Partial patch for 2.0 includes support for HTTP auth for solrindex and solrdedup and includes NUTCH-1026 and NUTCH-1036.

          Anyone who can test this would make me happy.

          Show
          Markus Jelsma added a comment - Partial patch for 2.0 includes support for HTTP auth for solrindex and solrdedup and includes NUTCH-1026 and NUTCH-1036 . Anyone who can test this would make me happy.
          Markus Jelsma made changes -
          Attachment NUTCH-987-2.0-1.patch [ 12486592 ]
          Markus Jelsma made changes -
          Link This issue is part of NUTCH-979 [ NUTCH-979 ]
          Lewis John McGibbney made changes -
          Attachment NUTCH-987-2.0-2.patch [ 12490230 ]
          Hide
          Lewis John McGibbney added a comment -

          Hi Markus, the patch for 2.0 does not apply cleanly for me I get the following output

          lewis@lewis-01:~/ASF/trunk$ patch -p0 -i NUTCH-987-2.0-1.patch
          patching file src/java/org/apache/nutch/indexer/solr/SolrUtils.java
          patching file src/java/org/apache/nutch/indexer/solr/SolrDeleteDuplicates.java
          patching file src/java/org/apache/nutch/indexer/solr/SolrIndexerJob.java
          patching file src/java/org/apache/nutch/indexer/solr/SolrConstants.java
          patching file src/java/org/apache/nutch/indexer/solr/SolrWriter.java
          patching file src/java/org/apache/nutch/indexer/IndexerReducer.java
          patching file conf/nutch-default.xml
          Hunk #1 FAILED at 728.
          Hunk #2 succeeded at 1060 (offset 13 lines).
          1 out of 2 hunks FAILED -- saving rejects to file conf/nutch-default.xml.rej
          

          I therefore I attach an updated patch which applies cleanly, however it breaks runtime builds and (already broken) tests with the following output

          [javac] Compiling 5 source files to /home/lewis/ASF/trunk/build/classes
              [javac] /home/lewis/ASF/trunk/src/java/org/apache/nutch/indexer/solr/SolrDeleteDuplicates.java:231: cannot find symbol
              [javac] symbol  : variable SolrUtils
              [javac] location: class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrInputFormat
              [javac]       SolrServer solr = SolrUtils.getCommonsHttpSolrServer(conf);
              [javac]                         ^
              [javac] /home/lewis/ASF/trunk/src/java/org/apache/nutch/indexer/solr/SolrDeleteDuplicates.java:261: cannot find symbol
              [javac] symbol  : variable SolrUtils
              [javac] location: class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrInputFormat
              [javac]       SolrServer solr = SolrUtils.getCommonsHttpSolrServer(conf);
              [javac]                         ^
              [javac] /home/lewis/ASF/trunk/src/java/org/apache/nutch/indexer/solr/SolrDeleteDuplicates.java:306: cannot find symbol
              [javac] symbol  : variable SolrUtils
              [javac] location: class org.apache.nutch.indexer.solr.SolrDeleteDuplicates
              [javac]        solr = SolrUtils.getCommonsHttpSolrServer(conf);
              [javac]               ^
              [javac] /home/lewis/ASF/trunk/src/java/org/apache/nutch/indexer/solr/SolrIndexerJob.java:70: cannot find symbol
              [javac] symbol  : variable SolrUtils
              [javac] location: class org.apache.nutch.indexer.solr.SolrIndexerJob
              [javac]       SolrServer solr = SolrUtils.getCommonsHttpSolrServer(getConf());
              [javac]                         ^
              [javac] /home/lewis/ASF/trunk/src/java/org/apache/nutch/indexer/solr/SolrWriter.java:51: cannot find symbol
              [javac] symbol  : variable SolrUtils
              [javac] location: class org.apache.nutch.indexer.solr.SolrWriter
              [javac]     solr = SolrUtils.getCommonsHttpSolrServer(conf);
              [javac]            ^
              [javac] /home/lewis/ASF/trunk/src/java/org/apache/nutch/indexer/solr/SolrWriter.java:64: cannot find symbol
              [javac] symbol  : variable SolrUtils
              [javac] location: class org.apache.nutch.indexer.solr.SolrWriter
              [javac]           val2 = SolrUtils.stripNonCharCodepoints((String)val);
              [javac]                  ^
              [javac] 6 errors
          
          BUILD FAILED
          /home/lewis/ASF/trunk/build.xml:96: Compile failed; see the compiler error output for details.
          
          Show
          Lewis John McGibbney added a comment - Hi Markus, the patch for 2.0 does not apply cleanly for me I get the following output lewis@lewis-01:~/ASF/trunk$ patch -p0 -i NUTCH-987-2.0-1.patch patching file src/java/org/apache/nutch/indexer/solr/SolrUtils.java patching file src/java/org/apache/nutch/indexer/solr/SolrDeleteDuplicates.java patching file src/java/org/apache/nutch/indexer/solr/SolrIndexerJob.java patching file src/java/org/apache/nutch/indexer/solr/SolrConstants.java patching file src/java/org/apache/nutch/indexer/solr/SolrWriter.java patching file src/java/org/apache/nutch/indexer/IndexerReducer.java patching file conf/nutch- default .xml Hunk #1 FAILED at 728. Hunk #2 succeeded at 1060 (offset 13 lines). 1 out of 2 hunks FAILED -- saving rejects to file conf/nutch- default .xml.rej I therefore I attach an updated patch which applies cleanly, however it breaks runtime builds and (already broken) tests with the following output [javac] Compiling 5 source files to /home/lewis/ASF/trunk/build/classes [javac] /home/lewis/ASF/trunk/src/java/org/apache/nutch/indexer/solr/SolrDeleteDuplicates.java:231: cannot find symbol [javac] symbol : variable SolrUtils [javac] location: class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrInputFormat [javac] SolrServer solr = SolrUtils.getCommonsHttpSolrServer(conf); [javac] ^ [javac] /home/lewis/ASF/trunk/src/java/org/apache/nutch/indexer/solr/SolrDeleteDuplicates.java:261: cannot find symbol [javac] symbol : variable SolrUtils [javac] location: class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrInputFormat [javac] SolrServer solr = SolrUtils.getCommonsHttpSolrServer(conf); [javac] ^ [javac] /home/lewis/ASF/trunk/src/java/org/apache/nutch/indexer/solr/SolrDeleteDuplicates.java:306: cannot find symbol [javac] symbol : variable SolrUtils [javac] location: class org.apache.nutch.indexer.solr.SolrDeleteDuplicates [javac] solr = SolrUtils.getCommonsHttpSolrServer(conf); [javac] ^ [javac] /home/lewis/ASF/trunk/src/java/org/apache/nutch/indexer/solr/SolrIndexerJob.java:70: cannot find symbol [javac] symbol : variable SolrUtils [javac] location: class org.apache.nutch.indexer.solr.SolrIndexerJob [javac] SolrServer solr = SolrUtils.getCommonsHttpSolrServer(getConf()); [javac] ^ [javac] /home/lewis/ASF/trunk/src/java/org/apache/nutch/indexer/solr/SolrWriter.java:51: cannot find symbol [javac] symbol : variable SolrUtils [javac] location: class org.apache.nutch.indexer.solr.SolrWriter [javac] solr = SolrUtils.getCommonsHttpSolrServer(conf); [javac] ^ [javac] /home/lewis/ASF/trunk/src/java/org/apache/nutch/indexer/solr/SolrWriter.java:64: cannot find symbol [javac] symbol : variable SolrUtils [javac] location: class org.apache.nutch.indexer.solr.SolrWriter [javac] val2 = SolrUtils.stripNonCharCodepoints(( String )val); [javac] ^ [javac] 6 errors BUILD FAILED /home/lewis/ASF/trunk/build.xml:96: Compile failed; see the compiler error output for details.
          Lewis John McGibbney made changes -
          Attachment NUTCH-987-2.0-2.patch [ 12490231 ]
          Lewis John McGibbney made changes -
          Comment [ Hi Markus, the patch for 2.0 does not apply cleanly for me I get the following output
          {code}
          lewis@lewis-01:~/ASF/trunk$ patch -p0 -i NUTCH-987-2.0-1.patch
          patching file src/java/org/apache/nutch/indexer/solr/SolrUtils.java
          patching file src/java/org/apache/nutch/indexer/solr/SolrDeleteDuplicates.java
          patching file src/java/org/apache/nutch/indexer/solr/SolrIndexerJob.java
          patching file src/java/org/apache/nutch/indexer/solr/SolrConstants.java
          patching file src/java/org/apache/nutch/indexer/solr/SolrWriter.java
          patching file src/java/org/apache/nutch/indexer/IndexerReducer.java
          patching file conf/nutch-default.xml
          Hunk #1 FAILED at 728.
          Hunk #2 succeeded at 1060 (offset 13 lines).
          1 out of 2 hunks FAILED -- saving rejects to file conf/nutch-default.xml.rej
          {code}

          I therefore I attach an updated patch which applies cleanly, however it breaks runtime builds and (already broken) tests with the following output
          {code}
          [javac] Compiling 5 source files to /home/lewis/ASF/trunk/build/classes
              [javac] /home/lewis/ASF/trunk/src/java/org/apache/nutch/indexer/solr/SolrDeleteDuplicates.java:231: cannot find symbol
              [javac] symbol : variable SolrUtils
              [javac] location: class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrInputFormat
              [javac] SolrServer solr = SolrUtils.getCommonsHttpSolrServer(conf);
              [javac] ^
              [javac] /home/lewis/ASF/trunk/src/java/org/apache/nutch/indexer/solr/SolrDeleteDuplicates.java:261: cannot find symbol
              [javac] symbol : variable SolrUtils
              [javac] location: class org.apache.nutch.indexer.solr.SolrDeleteDuplicates.SolrInputFormat
              [javac] SolrServer solr = SolrUtils.getCommonsHttpSolrServer(conf);
              [javac] ^
              [javac] /home/lewis/ASF/trunk/src/java/org/apache/nutch/indexer/solr/SolrDeleteDuplicates.java:306: cannot find symbol
              [javac] symbol : variable SolrUtils
              [javac] location: class org.apache.nutch.indexer.solr.SolrDeleteDuplicates
              [javac] solr = SolrUtils.getCommonsHttpSolrServer(conf);
              [javac] ^
              [javac] /home/lewis/ASF/trunk/src/java/org/apache/nutch/indexer/solr/SolrIndexerJob.java:70: cannot find symbol
              [javac] symbol : variable SolrUtils
              [javac] location: class org.apache.nutch.indexer.solr.SolrIndexerJob
              [javac] SolrServer solr = SolrUtils.getCommonsHttpSolrServer(getConf());
              [javac] ^
              [javac] /home/lewis/ASF/trunk/src/java/org/apache/nutch/indexer/solr/SolrWriter.java:51: cannot find symbol
              [javac] symbol : variable SolrUtils
              [javac] location: class org.apache.nutch.indexer.solr.SolrWriter
              [javac] solr = SolrUtils.getCommonsHttpSolrServer(conf);
              [javac] ^
              [javac] /home/lewis/ASF/trunk/src/java/org/apache/nutch/indexer/solr/SolrWriter.java:64: cannot find symbol
              [javac] symbol : variable SolrUtils
              [javac] location: class org.apache.nutch.indexer.solr.SolrWriter
              [javac] val2 = SolrUtils.stripNonCharCodepoints((String)val);
              [javac] ^
              [javac] 6 errors

          BUILD FAILED
          /home/lewis/ASF/trunk/build.xml:96: Compile failed; see the compiler error output for details.
          {code}
          ]
          Hide
          Markus Jelsma added a comment -

          Ah the config. You can easily add the config params yourself but they're not strictly required as the code already uses the same defaults.

          I'm off!! Cheers!

          Show
          Markus Jelsma added a comment - Ah the config. You can easily add the config params yourself but they're not strictly required as the code already uses the same defaults. I'm off!! Cheers!
          Markus Jelsma made changes -
          Fix Version/s 2.0 [ 12314893 ]
          Hide
          Markus Jelsma added a comment - - edited

          Resolved for 1.4, see NUTCH-1104 for 2.0

          Show
          Markus Jelsma added a comment - - edited Resolved for 1.4, see NUTCH-1104 for 2.0
          Markus Jelsma made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Markus Jelsma made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open Resolved Resolved
          132d 20h 7m 1 Markus Jelsma 06/Sep/11 12:57
          Resolved Resolved Closed Closed
          7d 9h 29m 1 Markus Jelsma 13/Sep/11 22:26

            People

            • Assignee:
              Markus Jelsma
              Reporter:
              Markus Jelsma
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development