Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-1765

Some doc and docx store multiple authors as semi-colon delimited list

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Trivial
    • Resolution: Fixed
    • None
    • None
    • None
    • None

    Description

      It looks like doc and docx are storing multiple authors in a single author field delimited by semi-colons. We should parse this value and add multiple authors where appropriate.

      Notes: when I tried to add an author with a semicolon in the name, the result was two authors...doesn't look like there is any escaping going on.

      We should check to see what's going on in the other MS formats and with other metadata items that are allowed to be multivalued in Dublin Core.

      Attachments

        Activity

          People

            Unassigned Unassigned
            tallison Tim Allison
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: