Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-1609

Leverage Google's LibPhonenumber for enhanced phone number extraction and metadata modeling

Agile BoardAttach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • 1.17, 2.0.0-BETA, 2.1.0
    • core
    • None

    Description

      Google's Libphonenumber can provide us with comprehensive support for modeling Phone number metadata properly in Tika.
      During the development of this patch I realized two things, namely

      • This is not a parser as such as Phone numbers are not mapped to any particular Mimetype
      • In addition, there can be many phone numbers per document, so this is most likely a Content Handler of sorts
      • Tika's Metadata support is currently too restrictive to allow us to persist many complex objects e.g. String, Object. We need to expand Meatdata support over and above String, String[].

      https://github.com/googlei18n/libphonenumber/

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            lewismc Lewis John McGibbney
            lewismc Lewis John McGibbney

            Dates

              Created:
              Updated:

              Slack

                Issue deployment