Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-1106

CLAVIN Integration

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Minor
    • Resolution: Won't Fix
    • Affects Version/s: 1.3
    • Fix Version/s: 1.16
    • Component/s: parser
    • Environment:

      All

      Description

      I've been evaluating CLAVIN as a way to extract location information from unstructured text. It seems like meshing it with Tika in some way would make a lot of sense. From CLAVIN website...

      CLAVIN (Cartographic Location And Vicinity INdexer) is an open source software package for document geotagging and geoparsing that employs context-based geographic entity resolution. It combines a variety of open source tools with natural language processing techniques to extract location names from unstructured text documents and resolve them against gazetteer records. Importantly, CLAVIN does not simply "look up" location names; rather, it uses intelligent heuristics in an attempt to identify precisely which "Springfield" (for example) was intended by the author, based on the context of the document. CLAVIN also employs fuzzy search to handle incorrectly-spelled location names, and it recognizes alternative names (e.g., "Ivory Coast" and "Côte d'Ivoire") as referring to the same geographic entity. By enriching text documents with structured geo data, CLAVIN enables hierarchical geospatial search and advanced geospatial analytics on unstructured data.

      There was only one other instance of the word "clavin" mentioned in the ASF jira site so I thought it was definitely worth posting here.

      https://github.com/Berico-Technologies/CLAVIN

        Attachments

          Activity

            People

            • Assignee:
              chrismattmann Chris A. Mattmann
              Reporter:
              adamestrada Adam Estrada
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: