Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-14959

Getting an error trying to web crawl a website

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Not A Problem
    • 8.6.3
    • None
    • website
    • OS: Mac

       

    Description

      Hi,

      I am getting following error when trying to crawl a website please direct me in right direction.

      Ravishers-MacBook-Air:solr-8.6.3 ravishersingh$ bin/post -c solrhelp -filetypes html https://factorpad.com/tech/solr/index.html

      java -classpath /Users/ravishersingh/desktop/solr-8.6.3/dist/solr-core-8.6.3.jar -Dauto=yes -Dfiletypes=html -Dc=solrhelp -Ddata=web org.apache.solr.util.SimplePostTool https://factorpad.com/tech/solr/index.html

      SimplePostTool version 5.0.0

      Posting web pages to Solr url http://localhost:8983/solr/solrhelp/update/extract

      Entering auto mode. Indexing pages with content-types corresponding to file endings html

      Entering crawl at level 0 (1 links total, 1 new)

      SimplePostTool: WARNING: Solr returned an error #404 (Not Found) for url: http://localhost:8983/solr/solrhelp/update/extract?literal.id=https%3A%2F%2Ffactorpad.com%2Ftech%2Fsolr%2Findex.html&literal.url=https%3A%2F%2Ffactorpad.com%2Ftech%2Fsolr%2Findex.html

      SimplePostTool: WARNING: Response: <html>

      <head>

      <meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>

      <title>Error 404 Not Found</title>

      </head>

      <body><h2>HTTP ERROR 404 Not Found</h2>

      <table>

      <tr><th>URI:</th><td>/solr/solrhelp/update/extract</td></tr>

      <tr><th>STATUS:</th><td>404</td></tr>

      <tr><th>MESSAGE:</th><td>Not Found</td></tr>

      <tr><th>SERVLET:</th><td>default</td></tr>

      </table>

       

      </body>

      </html>

      SimplePostTool: WARNING: IOException while reading response: java.io.FileNotFoundException: http://localhost:8983/solr/solrhelp/update/extract?literal.id=https%3A%2F%2Ffactorpad.com%2Ftech%2Fsolr%2Findex.html&literal.url=https%3A%2F%2Ffactorpad.com%2Ftech%2Fsolr%2Findex.html

      SimplePostTool: WARNING: An error occurred while posting https://factorpad.com/tech/solr/index.html

      0 web pages indexed.

      COMMITting Solr index changes to http://localhost:8983/solr/solrhelp/update/extract...

      SimplePostTool: WARNING: Solr returned an error #404 (Not Found) for url: http://localhost:8983/solr/solrhelp/update/extract?commit=true

      SimplePostTool: WARNING: Response: <html>

      <head>

      <meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>

      <title>Error 404 Not Found</title>

      </head>

      <body><h2>HTTP ERROR 404 Not Found</h2>

      <table>

      <tr><th>URI:</th><td>/solr/solrhelp/update/extract</td></tr>

      <tr><th>STATUS:</th><td>404</td></tr>

      <tr><th>MESSAGE:</th><td>Not Found</td></tr>

      <tr><th>SERVLET:</th><td>default</td></tr>

      </table>

       

      </body>

      </html>

      Time spent: 0:00:01.356

      Attachments

        Activity

          People

            Unassigned Unassigned
            ravinsher Ravisher Singh
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: