Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-1461

Problem with TableUtil

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Auto Closed
    • nutchgora
    • 2.5
    • parser
    • None
    • Debian / CDH3 / Nutch 2.0 Release

    • Patch Available

    Description

      Affects parse and updatedb and parse.

      Think i got some missformated urls into hbase but i can't fin them.
      It generates this error though. If i empty hbase and restart it goes for a couple of million pages indexed then it comes up again. Any tips on how to locate what row in the table that genereates this error?

      2012-08-28 01:48:10,871 WARN org.apache.hadoop.mapred.Child: Error running child
      java.lang.ArrayIndexOutOfBoundsException: 1
      at org.apache.nutch.util.TableUtil.unreverseUrl(TableUtil.java:98)
      at org.apache.nutch.parse.ParserJob$ParserMapper.map(ParserJob.java:102)
      at org.apache.nutch.parse.ParserJob$ParserMapper.map(ParserJob.java:76)
      at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
      at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
      at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:396)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
      at org.apache.hadoop.mapred.Child.main(Child.java:260)
      2012-08-28 01:48:10,875 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task

      Attachments

        1. TabelUtil_Fix.patch
          0.6 kB
          Christian Johnsson
        2. regex-urlfilter.txt
          2 kB
          Christian Johnsson

        Issue Links

          Activity

            People

              Unassigned Unassigned
              mr.johnsson Christian Johnsson
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: