Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-252

Launching a segread/readdb command kills any running nutch commands

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Won't Fix
    • 0.8
    • 0.8
    • None
    • None
    • multi-box installation using DFS (1 jobtracker/namenode master, 10 tasktracker/datanode slaves)

    Description

      I use a simple script to conduct a whole-web crawl (generate, fetch, updatedb, and repeat until target depth reached). While this is running, I monitor the progress via the jobtracker's browser-based UI. Sometimes there's a fairly long pause after one mapreduce job completes and the next one gets launched, so I mistakenly assume that depth has been reached. I then launch a segread -list or readdb -stats command to summarize the results. Doing so apparently kills any active jobs with absolutely no warning in any of the logs, the console output, or the jobtracker's UI. The jobs just stop writing to the logs and any child processes disappear. Usually, the jobtracker and tasktrackers remain up and respond to subsequent commands.

      Attachments

        Activity

          People

            Unassigned Unassigned
            schmed Chris Schneider
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: