Uploaded image for project: 'Apache Jena'
  1. Apache Jena
  2. JENA-827

Include all ISO 639-3 languages

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • Jena 2.12.1
    • Jena 2.13.0
    • RDF/XML
    • None

    Description

      WARN 2014-12-05 14:21:24,085 (com.hp.hpl.jena.rdf.model.impl.RDFDefaultErrorHandler:47) - http://www.w3.org/ns/oa#(line 42 column 36):
      {W116}
      
      ISO-639 does not define language: 'vls'.
      

      http://www.w3.org/ns/oa.rdf says

        <dc:creator xml:lang="vls">Herbert Van de Sompel</dc:creator>
      

      but it does.. http://www-01.sil.org/iso639-3/documentation.asp?id=vls

      The complete list of ISO639-3 is not included in https://github.com/apache/jena/blob/master/jena-core/src/main/java/com/hp/hpl/jena/rdfxml/xmlinput/lang/Iso639.java - only ISO639-2 and ISO639-3.

      The new lists can be found at http://www-01.sil.org/iso639-3/download.asp - e.g. http://www-01.sil.org/iso639-3/iso-639-3.tab (UTF-8 although browser disagrees).

      I can work on the script to update this. One question is if Iso639.java needs a new field for the identifier for all those languages which are not in -1 and -2 (e.g. "vls"). Another is if we should include the proper UTF-8 names of the languages to get the accents correct, e.g.

      bbj I L Ghomálá'

      I'm not sure if the permissions are compatible with Apache license:

      ISO 639-3 Code Tables Terms of Use

      The ISO 639-3 code set may be downloaded and incorporated into software products, web-based systems, digital devices, etc., either commercial or non-commercial, provided that:

      attribution is given www.sil.org/iso639-3/ as the source of the codes;
      the identifiers of the code set are not modified or extended except as may be privately agreed using the Private Use Area (range qaa to qtz), and then such extensions shall not be distributed publicly;
      the product, system, or device does not provide a means to redistribute the code set.

      the last bit might mean we should not include the *.tab files directly - but would the listing in Iso6539.java consitute a "means to redistribute the code set"?

      Is "the identifiers of the code set are not modified" compatible with Apache License which presumably allows you to modify anything?

      Attachments

        Activity

          People

            andy Andy Seaborne
            soilandreyes Stian Soiland-Reyes (old) (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 48h
                48h
                Remaining:
                Remaining Estimate - 48h
                48h
                Logged:
                Time Spent - Not Specified
                Not Specified