Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-1349

Make batchId explcit within debug logging and improve CLI

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: nutchgora
    • Fix Version/s: nutchgora
    • Component/s: indexer
    • Labels:
      None
    • Patch Info:
      Patch Available

      Description

      I find this a pain when trying to locate the batchId of some urls which are skipped when going to the Solr index. My DEBUG log output gives me

      2012-05-03 20:44:55,268 DEBUG indexer.IndexerJob (IndexerJob.java:map(83)) - Skipping http://www.glasgowwheelers.com/; different batch id
      2012-05-03 20:44:55,259 DEBUG indexer.IndexerJob (IndexerJob.java:map(83)) - Skipping http://www.heraldscotland.com/; different batch id
      

      when I would actually like

      2012-05-03 20:44:55,268 DEBUG indexer.IndexerJob (IndexerJob.java:map(83)) - Skipping http://www.glasgowwheelers.com/; different batch id (ACTUAL BATCH ID)
      2012-05-03 20:44:55,259 DEBUG indexer.IndexerJob (IndexerJob.java:map(83)) - Skipping http://www.heraldscotland.com/; different batch id (ACTUAL BATCH ID)
      

      patch coming up soon

      1. NUTCH-1349.patch
        9 kB
        Lewis John McGibbney
      2. NUTCH-1349-v2.patch
        9 kB
        Lewis John McGibbney
      3. NUTCH-1349-v2.patch
        9 kB
        Lewis John McGibbney

        Activity

        Hide
        ferdy.g Ferdy Galema added a comment -

        +1 This will also benefits other jobs depending on a batchId.

        Show
        ferdy.g Ferdy Galema added a comment - +1 This will also benefits other jobs depending on a batchId.
        Hide
        lewismc Lewis John McGibbney added a comment -

        Slight modification to the issue description. A bit of work ended up going into the CLI making it more pretty!
        Also I've updated the log4j.properties file as well.

        I'm not truly happy with this, a currently it displays the batchId for the CLI input, rather than the batchId from the key that doesn't match the input batchId! Does this make sense?

        Show
        lewismc Lewis John McGibbney added a comment - Slight modification to the issue description. A bit of work ended up going into the CLI making it more pretty! Also I've updated the log4j.properties file as well. I'm not truly happy with this, a currently it displays the batchId for the CLI input, rather than the batchId from the key that doesn't match the input batchId! Does this make sense?
        Hide
        lewismc Lewis John McGibbney added a comment -

        Trivial patch.
        Question I have though, how can we find out the batchId of some given key which matches

        TableUtil.unreverseUrl(key)
        

        ?

        Show
        lewismc Lewis John McGibbney added a comment - Trivial patch. Question I have though, how can we find out the batchId of some given key which matches TableUtil.unreverseUrl(key) ?
        Hide
        ferdy.g Ferdy Galema added a comment -

        Good work on improving the CLI. About the displaying mismatching batchId, your patch prints batchId while you should use 'mark' instead.

        What do you mean with matching TableUtil.unreverseUrl(key)?

        Show
        ferdy.g Ferdy Galema added a comment - Good work on improving the CLI. About the displaying mismatching batchId, your patch prints batchId while you should use 'mark' instead. What do you mean with matching TableUtil.unreverseUrl(key)?
        Hide
        lewismc Lewis John McGibbney added a comment -

        new patch.
        If you are happy then I will commit. I checked out the other jobs and this seems to be all the logging I can improve given this specific issue and the debug option.

        Show
        lewismc Lewis John McGibbney added a comment - new patch. If you are happy then I will commit. I checked out the other jobs and this seems to be all the logging I can improve given this specific issue and the debug option.
        Hide
        lewismc Lewis John McGibbney added a comment -

        reattach with ASF licensing

        Show
        lewismc Lewis John McGibbney added a comment - reattach with ASF licensing
        Hide
        ferdy.g Ferdy Galema added a comment -

        Looks good. +1

        When need arises we can always add it for similar jobs too.

        Show
        ferdy.g Ferdy Galema added a comment - Looks good. +1 When need arises we can always add it for similar jobs too.
        Hide
        lewismc Lewis John McGibbney added a comment -

        Committed @ revision 1335436 in Nutchgora branch

        Show
        lewismc Lewis John McGibbney added a comment - Committed @ revision 1335436 in Nutchgora branch
        Hide
        hudson Hudson added a comment -

        Integrated in Nutch-nutchgora #248 (See https://builds.apache.org/job/Nutch-nutchgora/248/)
        Commit to address NUTCH-1349 and update to CHANGES.txt (Revision 1335436)

        Result = SUCCESS
        lewismc :
        Files :

        • /nutch/branches/nutchgora/CHANGES.txt
        • /nutch/branches/nutchgora/conf/log4j.properties
        • /nutch/branches/nutchgora/src/bin/nutch
        • /nutch/branches/nutchgora/src/java/org/apache/nutch/crawl/WebTableReader.java
        • /nutch/branches/nutchgora/src/java/org/apache/nutch/fetcher/FetcherJob.java
        • /nutch/branches/nutchgora/src/java/org/apache/nutch/indexer/IndexerJob.java
        • /nutch/branches/nutchgora/src/java/org/apache/nutch/parse/ParserJob.java
        Show
        hudson Hudson added a comment - Integrated in Nutch-nutchgora #248 (See https://builds.apache.org/job/Nutch-nutchgora/248/ ) Commit to address NUTCH-1349 and update to CHANGES.txt (Revision 1335436) Result = SUCCESS lewismc : Files : /nutch/branches/nutchgora/CHANGES.txt /nutch/branches/nutchgora/conf/log4j.properties /nutch/branches/nutchgora/src/bin/nutch /nutch/branches/nutchgora/src/java/org/apache/nutch/crawl/WebTableReader.java /nutch/branches/nutchgora/src/java/org/apache/nutch/fetcher/FetcherJob.java /nutch/branches/nutchgora/src/java/org/apache/nutch/indexer/IndexerJob.java /nutch/branches/nutchgora/src/java/org/apache/nutch/parse/ParserJob.java

          People

          • Assignee:
            lewismc Lewis John McGibbney
            Reporter:
            lewismc Lewis John McGibbney
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development