Issue Details (XML | Word | Printable)

Key: NUTCH-406
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Chris A. Mattmann
Reporter: Doğacan Güney
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Nutch

Metadata tries to write null values

Created: 23/Nov/06 01:25 PM   Updated: 23/Nov/06 05:19 PM
Return to search
Component/s: None
Affects Version/s: 0.9.0
Fix Version/s: 0.9.0

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works NUTCH-406.patch 2006-11-23 04:16 PM Doğacan Güney 0.6 kB
Text File Licensed for inclusion in ASF works NUTCH-406.patch 2006-11-23 01:27 PM Doğacan Güney 0.5 kB

Resolution Date: 23/Nov/06 05:17 PM


 Description  « Hide
During parsing, some urls (especially pdfs, it seems) may create <some_key, null> pairs in ParseData's parseMeta.
When Metadata.write() tries to write such a pair, it causes an NPE.

Stack trace will be something like this:
at org.apache.hadoop.io.Text.encode(Text.java:373)
at org.apache.hadoop.io.Text.encode(Text.java:354)
at org.apache.hadoop.io.Text.writeString(Text.java:394)
at org.apache.nutch.metadata.Metadata.write(Metadata.java:214)

I can consistently reproduce this using the following url:
http://www.efesbev.com/corporate_governance/pdf/MergerAgreement.pdf



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Repository Revision Date User Message
ASF #478619 Thu Nov 23 17:15:55 UTC 2006 mattmann - fix for NUTCH-406 Metadata tries to write null values
Files Changed
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/metadata/Metadata.java
MODIFY /lucene/nutch/trunk/CHANGES.txt

Repository Revision Date User Message
ASF #478631 Thu Nov 23 18:26:37 UTC 2006 mattmann - use spaces instead of tabs in Metadata.java
- added test case for NUTCH-406 in TestMetadata.java
Files Changed
MODIFY /lucene/nutch/trunk/src/test/org/apache/nutch/metadata/TestMetadata.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/metadata/Metadata.java