Issue Details (XML | Word | Printable)

Key: NUTCH-414
Type: Bug Bug
Status: Open Open
Priority: Major Major
Assignee: Unassigned
Reporter: Brian Whitman
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Nutch

parse-mp3 plugin concatenating previous tags for text field

Created: 12/Dec/06 03:29 PM   Updated: 12/Dec/06 03:29 PM
Return to search
Component/s: fetcher
Affects Version/s: 0.9.0
Fix Version/s: None

Time Tracking:
Not Specified

Environment: -


 Description  « Hide
The parse-mp3 plugin seems to be saving a state of the previous parse's text content. For every new mp3 file parsed, it is putting the contents of all the previous text fields in the plain text field for that file.

You can see this by fetching a set of mp3s in one segment, then viewing their plain text in the nutch webapp. The plaintext will include the contents of all files fetched in that round, which makes searching fruitless.

I made a tiny band-aid change to MP3Parser.java and MetadataCollector.java against the nightly. It seems to fix the problem.

— MP3Parser.java 2006-12-10 09:43:26.000000000 -0500
+++ MP3Parser.java.new 2006-12-10 16:37:03.000000000 -0500
@@ -67,7 +67,7 @@
fos.write(raw);
fos.close();
MP3File mp3 = new MP3File(tmp);
-
+ metadataCollector.clearText();
if (mp3.hasID3v2Tag()) { parse = getID3v2Parse(mp3, content.getMetadata()); } else if (mp3.hasID3v1Tag()) { --- MetadataCollector.java 2006-12-10 09:43:26.000000000 -0500 +++ MetadataCollector.java.new 2006-12-10 16:37:28.000000000 -0500 @@ -42,6 +42,10 @@ this.conf = conf; }

+ public void clearText() { + text = ""; + }
+
public void notifyProperty(String name, String value) throws
MalformedURLException {
if (name.equals("TIT2-Text"))
setTitle(value);



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
There are no subversion log entries for this issue yet.