Uploaded image for project: 'Stanbol (Retired)'
  1. Stanbol (Retired)
  2. STANBOL-90

Create a maven artifact to embed all the default stanbol models data

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.9.0-incubating
    • None
    • None

    Description

      To make stanbol useful, esp. in offline mode, it needs to some statistical model and entity / topic indices. Those indices can be huge (several GB for all the entities of dbpedia and geonames for instance) hence cannot be packaged as part of the default distrib. However it is very desirable to embed some default statistical models

      • opennlp sentence detector for English
      • opennlp name finder models for English for organizations, people, places
      • solr index for the top 10000 most popular entities (of type organizations, people, places) as measured by number of incoming links in the Wikipedia article graph.
      • solr index for the top 1000 most popular topics number of Wikipedia articles categorized in this category or subcategory

      The goal is to keep that maven artifact less that 100 MB (ideally even smaller) so that it does not put a big barrier to entry to people downloading the default distribution of Stanbol.

      To avoid slowing down the svn repo, those data files will not be put under version control, just the pom.xml + script to rebuild the artifact from a previous version of the jar.

      Attachments

        Activity

          People

            ogrisel Olivier Grisel
            ogrisel Olivier Grisel
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment