[NUTCH-139] Standard metadata property names in the ParseData metadata - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: 0.6, 0.7, 0.7.1, 0.7.2, 0.8
Fix Version/s: 0.8
Component/s: fetcher
Labels:
None
Environment:

Power Mac OS X 10.4, Dual Processor G5 2.0 Ghz, 1.5 GB RAM, although bug is independent of environment

Description

Currently, people are free to name their string-based properties anything that they want, such as having names of "Content-type", "content-TyPe", "CONTENT_TYPE" all having the same meaning. Stefan G. I believe proposed a solution in which all property names be converted to lower case, but in essence this really only fixes half the problem right (the case of identifying that "CONTENT_TYPE"
and "conTeNT_TyPE" and all the permutations are really the same). What about
if I named it "Content Type", or "ContentType"?

I propose that a way to correct this would be to create a standard set of named Strings in the ParseData class that the protocol framework and the parsing framework could use to identify common properties such as "Content-type", "Creator", "Language", etc.

The properties would be defined at the top of the ParseData class, something like:

public class ParseData

{ ..... public static final String CONTENT_TYPE = "content-type"; public static final String CREATOR = "creator"; .... }

In this fashion, users could at least know what the name of the standard properties that they can obtain from the ParseData are, for example by making a call to ParseData.getMetadata().get(ParseData.CONTENT_TYPE) to get the content type or a call to ParseData.getMetadata().set(ParseData.CONTENT_TYPE, "text/xml"); Of course, this wouldn't preclude users from doing what they are currently doing, it would just provide a standard method of obtaining some of the more common, critical metadata without pouring over the code base to figure out what they are named.

I'll contribute a patch near the end of the this week, or beg. of next week that addresses this issue.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

NUTCH-139.060208.patch
09/Feb/06 01:45
131 kB
Jerome Charron
NUTCH-139.060105.patch
06/Jan/06 07:06
112 kB
Jerome Charron

Activity

People

Assignee:: Chris A. Mattmann

Reporter:: Chris A. Mattmann

Votes:: 1 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Due:: 19/Dec/05

Created:: 14/Dec/05 13:02

Updated:: 24/Oct/06 16:14

Resolved:: 09/Feb/06 06:51