Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-1765

Some doc and docx store multiple authors as semi-colon delimited list

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Trivial
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      It looks like doc and docx are storing multiple authors in a single author field delimited by semi-colons. We should parse this value and add multiple authors where appropriate.

      Notes: when I tried to add an author with a semicolon in the name, the result was two authors...doesn't look like there is any escaping going on.

      We should check to see what's going on in the other MS formats and with other metadata items that are allowed to be multivalued in Dublin Core.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              tallison@apache.org Tim Allison
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: