Uploaded image for project: 'ManifoldCF'
  1. ManifoldCF
  2. CONNECTORS-805

Crawling author metadata from feeds

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • ManifoldCF 1.4
    • ManifoldCF 1.5
    • RSS connector
    • None

    Description

      Functionality for retrieving the author of a RSS entry.

      The RSS specifications treat this differently:

      RSS 2.0 (Source -> http://www.rssboard.org/rss-specification#ltauthorgtSubelementOfLtitemgt):

      <author> sub-element of <item>
      It's the email address of the author of the item. For newspapers and magazines syndicating via RSS, the author is the person who wrote the article that the <item> describes.

      Atom (Source -> http://www.ietf.org/rfc/rfc4287.txt):

      The "atom:author" element is a Person construct that indicates the author of the entry or feed.

      atomAuthor = element atom:author

      { atomPersonConstruct }

      If an atom:entry element does not contain atom:author elements, then
      the atom:author elements of the contained atom:source element are
      considered to apply. In an Atom Feed Document, the atom:author
      elements of the containing atom:feed element are considered to apply
      to the entry if there are no atom:author elements in the locations
      described above.

      The atomPersonConstruct looks like this:

      atomPersonConstruct =
      atomCommonAttributes,
      (element atom:name { text }
      & element atom:uri { atomUri }?
      & element atom:email { atomEmailAddress }?
      & extensionElement*)

      where atomCommonAttributes is defined like this:

      atomCommonAttributes =
      attribute xml:base { atomUri }?,
      attribute xml:lang { atomLanguageTag }?,
      undefinedAttribute*

      Further more there exists a atom:contributor tag:

      The "atom:contributor" element is a Person construct that indicates a person or other entity who contributed to the entry or feed.

      atomContributor = element atom:contributor { atomPersonConstruct }

      For further information please check the specifciation.

      Dublin Core (Source -> http://dublincore.org/documents/dcmi-type-vocabulary/index.shtml#elements-creator)

      <dc:creator>
      The primary individual responsible for the content of the resource.

      The element can be at the <item>, <image> or <channel> level.

      Attachments

        1. RSSConnector.java.patch
          6 kB
          Benjamin Brandmeier

        Activity

          People

            kwright@metacarta.com Karl Wright
            benjamin brandmeier Benjamin Brandmeier
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: