Details
-
Improvement
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
Jena 2.12.1
-
None
Description
WARN 2014-12-05 14:21:24,085 (com.hp.hpl.jena.rdf.model.impl.RDFDefaultErrorHandler:47) - http://www.w3.org/ns/oa#(line 42 column 36): {W116} ISO-639 does not define language: 'vls'.
http://www.w3.org/ns/oa.rdf says
<dc:creator xml:lang="vls">Herbert Van de Sompel</dc:creator>
but it does.. http://www-01.sil.org/iso639-3/documentation.asp?id=vls
The complete list of ISO639-3 is not included in https://github.com/apache/jena/blob/master/jena-core/src/main/java/com/hp/hpl/jena/rdfxml/xmlinput/lang/Iso639.java - only ISO639-2 and ISO639-3.
The new lists can be found at http://www-01.sil.org/iso639-3/download.asp - e.g. http://www-01.sil.org/iso639-3/iso-639-3.tab (UTF-8 although browser disagrees).
I can work on the script to update this. One question is if Iso639.java needs a new field for the identifier for all those languages which are not in -1 and -2 (e.g. "vls"). Another is if we should include the proper UTF-8 names of the languages to get the accents correct, e.g.
bbj I L Ghomálá'
I'm not sure if the permissions are compatible with Apache license:
ISO 639-3 Code Tables Terms of Use
The ISO 639-3 code set may be downloaded and incorporated into software products, web-based systems, digital devices, etc., either commercial or non-commercial, provided that:
attribution is given www.sil.org/iso639-3/ as the source of the codes;
the identifiers of the code set are not modified or extended except as may be privately agreed using the Private Use Area (range qaa to qtz), and then such extensions shall not be distributed publicly;
the product, system, or device does not provide a means to redistribute the code set.
the last bit might mean we should not include the *.tab files directly - but would the listing in Iso6539.java consitute a "means to redistribute the code set"?
Is "the identifiers of the code set are not modified" compatible with Apache License which presumably allows you to modify anything?