Nutch
  1. Nutch
  2. NUTCH-1349

Make batchId explcit within debug logging and improve CLI

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: nutchgora
    • Fix Version/s: nutchgora
    • Component/s: indexer
    • Labels:
      None
    • Patch Info:
      Patch Available

      Description

      I find this a pain when trying to locate the batchId of some urls which are skipped when going to the Solr index. My DEBUG log output gives me

      2012-05-03 20:44:55,268 DEBUG indexer.IndexerJob (IndexerJob.java:map(83)) - Skipping http://www.glasgowwheelers.com/; different batch id
      2012-05-03 20:44:55,259 DEBUG indexer.IndexerJob (IndexerJob.java:map(83)) - Skipping http://www.heraldscotland.com/; different batch id
      

      when I would actually like

      2012-05-03 20:44:55,268 DEBUG indexer.IndexerJob (IndexerJob.java:map(83)) - Skipping http://www.glasgowwheelers.com/; different batch id (ACTUAL BATCH ID)
      2012-05-03 20:44:55,259 DEBUG indexer.IndexerJob (IndexerJob.java:map(83)) - Skipping http://www.heraldscotland.com/; different batch id (ACTUAL BATCH ID)
      

      patch coming up soon

      1. NUTCH-1349.patch
        9 kB
        Lewis John McGibbney
      2. NUTCH-1349-v2.patch
        9 kB
        Lewis John McGibbney
      3. NUTCH-1349-v2.patch
        9 kB
        Lewis John McGibbney

        Activity

        Hide
        Hudson added a comment -

        Integrated in Nutch-nutchgora #248 (See https://builds.apache.org/job/Nutch-nutchgora/248/)
        Commit to address NUTCH-1349 and update to CHANGES.txt (Revision 1335436)

        Result = SUCCESS
        lewismc :
        Files :

        • /nutch/branches/nutchgora/CHANGES.txt
        • /nutch/branches/nutchgora/conf/log4j.properties
        • /nutch/branches/nutchgora/src/bin/nutch
        • /nutch/branches/nutchgora/src/java/org/apache/nutch/crawl/WebTableReader.java
        • /nutch/branches/nutchgora/src/java/org/apache/nutch/fetcher/FetcherJob.java
        • /nutch/branches/nutchgora/src/java/org/apache/nutch/indexer/IndexerJob.java
        • /nutch/branches/nutchgora/src/java/org/apache/nutch/parse/ParserJob.java
        Show
        Hudson added a comment - Integrated in Nutch-nutchgora #248 (See https://builds.apache.org/job/Nutch-nutchgora/248/ ) Commit to address NUTCH-1349 and update to CHANGES.txt (Revision 1335436) Result = SUCCESS lewismc : Files : /nutch/branches/nutchgora/CHANGES.txt /nutch/branches/nutchgora/conf/log4j.properties /nutch/branches/nutchgora/src/bin/nutch /nutch/branches/nutchgora/src/java/org/apache/nutch/crawl/WebTableReader.java /nutch/branches/nutchgora/src/java/org/apache/nutch/fetcher/FetcherJob.java /nutch/branches/nutchgora/src/java/org/apache/nutch/indexer/IndexerJob.java /nutch/branches/nutchgora/src/java/org/apache/nutch/parse/ParserJob.java
        Hide
        Lewis John McGibbney added a comment -

        Committed @ revision 1335436 in Nutchgora branch

        Show
        Lewis John McGibbney added a comment - Committed @ revision 1335436 in Nutchgora branch
        Hide
        Ferdy Galema added a comment -

        Looks good. +1

        When need arises we can always add it for similar jobs too.

        Show
        Ferdy Galema added a comment - Looks good. +1 When need arises we can always add it for similar jobs too.
        Hide
        Lewis John McGibbney added a comment -

        reattach with ASF licensing

        Show
        Lewis John McGibbney added a comment - reattach with ASF licensing
        Hide
        Lewis John McGibbney added a comment -

        new patch.
        If you are happy then I will commit. I checked out the other jobs and this seems to be all the logging I can improve given this specific issue and the debug option.

        Show
        Lewis John McGibbney added a comment - new patch. If you are happy then I will commit. I checked out the other jobs and this seems to be all the logging I can improve given this specific issue and the debug option.
        Hide
        Ferdy Galema added a comment -

        Good work on improving the CLI. About the displaying mismatching batchId, your patch prints batchId while you should use 'mark' instead.

        What do you mean with matching TableUtil.unreverseUrl(key)?

        Show
        Ferdy Galema added a comment - Good work on improving the CLI. About the displaying mismatching batchId, your patch prints batchId while you should use 'mark' instead. What do you mean with matching TableUtil.unreverseUrl(key)?
        Hide
        Lewis John McGibbney added a comment -

        Trivial patch.
        Question I have though, how can we find out the batchId of some given key which matches

        TableUtil.unreverseUrl(key)
        

        ?

        Show
        Lewis John McGibbney added a comment - Trivial patch. Question I have though, how can we find out the batchId of some given key which matches TableUtil.unreverseUrl(key) ?
        Hide
        Lewis John McGibbney added a comment -

        Slight modification to the issue description. A bit of work ended up going into the CLI making it more pretty!
        Also I've updated the log4j.properties file as well.

        I'm not truly happy with this, a currently it displays the batchId for the CLI input, rather than the batchId from the key that doesn't match the input batchId! Does this make sense?

        Show
        Lewis John McGibbney added a comment - Slight modification to the issue description. A bit of work ended up going into the CLI making it more pretty! Also I've updated the log4j.properties file as well. I'm not truly happy with this, a currently it displays the batchId for the CLI input, rather than the batchId from the key that doesn't match the input batchId! Does this make sense?
        Hide
        Ferdy Galema added a comment -

        +1 This will also benefits other jobs depending on a batchId.

        Show
        Ferdy Galema added a comment - +1 This will also benefits other jobs depending on a batchId.

          People

          • Assignee:
            Lewis John McGibbney
            Reporter:
            Lewis John McGibbney
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development