Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-1441

ExternalParsers should allow dynamic keys to be specified for Regexs

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.7
    • Component/s: parser
    • Labels:
    • Environment:

      while working on TIKA-605 and memex

      Description

      While working on TIKA-605, I was trying to use ExternalParsers and I came across an interesting use case. What if there are so many met keys that specifying all of them by hand as individual regexs would be repetitive, and tedious. What if the met key itself could also be specified by a regex, e.g., we just take the first group to be the key, and then the next group would be the actual value? I ran across this in parsing GDAL output and so a very simple improvement to the ExternalParsers Map<Pattern, String> map would be to allow it to take e.g., null or "" Strings and then take that to mean that the Pattern specifies both the key name and the key value.
      I've got a patch I'll upload all tests pass and I need this to get TIKA-605 in and done.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                chrismattmann Chris A. Mattmann
                Reporter:
                chrismattmann Chris A. Mattmann
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: