Issue Details (XML | Word | Printable)

Key: NUTCH-406
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Chris A. Mattmann
Reporter: Doğacan Güney
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Nutch

Metadata tries to write null values

Created: 23/Nov/06 01:25 PM   Updated: 23/Nov/06 05:19 PM
Return to search
Component/s: None
Affects Version/s: 0.9.0
Fix Version/s: 0.9.0

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works NUTCH-406.patch 2006-11-23 04:16 PM Doğacan Güney 0.6 kB
Text File Licensed for inclusion in ASF works NUTCH-406.patch 2006-11-23 01:27 PM Doğacan Güney 0.5 kB

Resolution Date: 23/Nov/06 05:17 PM


 Description  « Hide
During parsing, some urls (especially pdfs, it seems) may create <some_key, null> pairs in ParseData's parseMeta.
When Metadata.write() tries to write such a pair, it causes an NPE.

Stack trace will be something like this:
at org.apache.hadoop.io.Text.encode(Text.java:373)
at org.apache.hadoop.io.Text.encode(Text.java:354)
at org.apache.hadoop.io.Text.writeString(Text.java:394)
at org.apache.nutch.metadata.Metadata.write(Metadata.java:214)

I can consistently reproduce this using the following url:
http://www.efesbev.com/corporate_governance/pdf/MergerAgreement.pdf



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Doğacan Güney made changes - 23/Nov/06 01:27 PM
Field Original Value New Value
Attachment NUTCH-406.patch [ 12345552 ]
Chris A. Mattmann made changes - 23/Nov/06 03:43 PM
Assignee Chris A. Mattmann [ chrismattmann ]
Chris A. Mattmann made changes - 23/Nov/06 03:43 PM
Status Open [ 1 ] In Progress [ 3 ]
Doğacan Güney made changes - 23/Nov/06 04:16 PM
Attachment NUTCH-406.patch [ 12345565 ]
Chris A. Mattmann made changes - 23/Nov/06 05:17 PM
Fix Version/s 0.9.0 [ 12312013 ]
Status In Progress [ 3 ] Resolved [ 5 ]
Resolution Fixed [ 1 ]
Chris A. Mattmann made changes - 23/Nov/06 05:19 PM
Status Resolved [ 5 ] Closed [ 6 ]