Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-1913

Integrate MIT Information Extraction(MITIE) into Tika to perform Named Entity Recognition

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.13
    • Fix Version/s: 1.13
    • Component/s: parser
    • Labels:

      Description

      Hello folks,

      MIT Information Extraction provides support for free state-of-the-art information extraction that includes named entity recognition and binary relation detection.

      They contain pre trained models and functions to train new models.
      I propose that we can use their precompiled jar or maven central project and supply it at runtime to enable MITIE Named Entity Recognition in Tika.

        Activity

        Hide
        chrismattmann Chris A. Mattmann added a comment -

        Thanks Manali Shah and Thamme Gowda!

        LMC-053601:tika1.13 mattmann$ git push -u origin master
        Counting objects: 102, done.
        Delta compression using up to 8 threads.
        Compressing objects: 100% (70/70), done.
        Writing objects: 100% (102/102), 9.41 KiB | 0 bytes/s, done.
        Total 102 (delta 30), reused 0 (delta 0)
        remote: tika git commit: record changes for MITIE and TIKA-1913.
        remote: tika git commit: Merge branch 'TIKA-1913' of https://github.com/manalishah/tika into TIKA-1913
        remote: tika git commit: model path flaw
        remote: tika git commit: removed starred imports
        remote: tika git commit: Merge remote-tracking branch 'upstream/master' into TIKA-1913
        remote: tika git commit: removed logs
        remote: tika git commit: code cleanup
        remote: tika git commit: runtime binding to mitie
        remote: tika git commit: mitie ner parser added
        To https://git-wip-us.apache.org/repos/asf/tika.git
           e2fdcaa..f827026  master -> master
        Branch master set up to track remote branch master from origin.
        LMC-053601:tika1.13 mattmann$ 
        
        Show
        chrismattmann Chris A. Mattmann added a comment - Thanks Manali Shah and Thamme Gowda ! LMC-053601:tika1.13 mattmann$ git push -u origin master Counting objects: 102, done. Delta compression using up to 8 threads. Compressing objects: 100% (70/70), done. Writing objects: 100% (102/102), 9.41 KiB | 0 bytes/s, done. Total 102 (delta 30), reused 0 (delta 0) remote: tika git commit: record changes for MITIE and TIKA-1913. remote: tika git commit: Merge branch 'TIKA-1913' of https://github.com/manalishah/tika into TIKA-1913 remote: tika git commit: model path flaw remote: tika git commit: removed starred imports remote: tika git commit: Merge remote-tracking branch 'upstream/master' into TIKA-1913 remote: tika git commit: removed logs remote: tika git commit: code cleanup remote: tika git commit: runtime binding to mitie remote: tika git commit: mitie ner parser added To https://git-wip-us.apache.org/repos/asf/tika.git e2fdcaa..f827026 master -> master Branch master set up to track remote branch master from origin. LMC-053601:tika1.13 mattmann$
        Hide
        chrismattmann Chris A. Mattmann added a comment -
        Show
        chrismattmann Chris A. Mattmann added a comment - documentation is here: https://github.com/manalishah/mitie-resources

          People

          • Assignee:
            chrismattmann Chris A. Mattmann
            Reporter:
            manalishah.91@gmail.com Manali Shah
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 168h
              168h
              Remaining:
              Remaining Estimate - 168h
              168h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development