Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2798

Nutch v2.4 Not Able to crawl after javax.faces.viewstate

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Trivial
    • Resolution: Not A Bug
    • 2.4
    • None
    • fetcher
    • None
    • Ubuntu mate

    Description

      Nutch v2.4 Not crawling The html page After input tag with name javax.faces.viewstate it is crawling before this tag but unable to go ahead after this javax viewstate which is having a lot special character.

      This page is having different tabs, Current crawler is fetching information till date(
      Date Published: 06/30/2020 09:00 PM) After that it is unable to fetch from Assembly Bill No. 103 which is title
      i m crawling this site: http://leginfo.legislature.ca.gov/faces/billTextClient.xhtml?bill_id=201920200AB103

       

       

      This is the output i am getting after crawling.

       

       

       

      Attachments

        1. image-2020-07-09-21-07-32-811.png
          132 kB
          Mihir Sharma
        2. image-2020-07-09-19-43-28-586.png
          174 kB
          Mihir Sharma
        3. image-2020-07-06-20-22-07-351.png
          383 kB
          Mihir Sharma
        4. image-2020-07-06-20-20-49-580.png
          276 kB
          Mihir Sharma
        5. hadoop.log.2020-07-10
          340 kB
          Mihir Sharma

        Activity

          People

            Unassigned Unassigned
            Mihir22 Mihir Sharma
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: