Uploaded image for project: 'Apache Any23'
  1. Apache Any23
  2. ANY23-65

Update to RDFa extraction stylesheet

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.7.0
    • Fix Version/s: 1.0
    • Component/s: core
    • Labels:

      Description

      The RDFa 1.1 Core specification requests namespace prefixes in HTML5 be put in a "prefix" attribute like this: "ns1: http://example.org/ ns2: http://example.com/"

      My sample HTML page has this, but Sindice, which uses Any23, didn't read my namespace correctly. I narrowed it down to, and changed accordingly, the XSLT template "tokenize2" in the rdfa.xslt stylesheet. The template expected "ns1:http://example.org/ ns2:http://example.com/" (no spaces between prefix and namespace URI) and did not normalize whitespace, like linebreaks (although I'm not sure that broke the functionality).

      I use Any23 0.6.1 locally, but http://svn.apache.org/viewvc/incubator/any23/trunk/core/src/main/resources/org/apache/any23/extractor/rdfa/rdfa.xslt?revision=1231556&view=markup shows that the template is the same in the trunk.

      A possible problem may be that the new template will not accept the non-spaced namespace definitions, like you can find in the RDFa produced by Best Buy. A further improvement to my template may be accepting both namespace definitions with spaces and the ones without.

        Attachments

        1. rdfa.xslt
          37 kB
          Ben Companjen
        2. stylesheet.patch
          3 kB
          Ben Companjen
        3. stylesheet3.patch
          5 kB
          Ben Companjen
        4. test.patch
          2 kB
          Ben Companjen
        5. rdfa-11-curies-a.html
          0.8 kB
          Ben Companjen

        Issue Links

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              bencomp Ben Companjen

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 3h
                3h
                Remaining:
                Remaining Estimate - 3h
                3h
                Logged:
                Time Spent - Not Specified
                Not Specified

                  Issue deployment