[STANBOL-90] Create a maven artifact to embed all the default stanbol models data - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.9.0-incubating
Component/s: None
Labels:
None

Description

To make stanbol useful, esp. in offline mode, it needs to some statistical model and entity / topic indices. Those indices can be huge (several GB for all the entities of dbpedia and geonames for instance) hence cannot be packaged as part of the default distrib. However it is very desirable to embed some default statistical models

opennlp sentence detector for English
opennlp name finder models for English for organizations, people, places
solr index for the top 10000 most popular entities (of type organizations, people, places) as measured by number of incoming links in the Wikipedia article graph.
solr index for the top 1000 most popular topics number of Wikipedia articles categorized in this category or subcategory

The goal is to keep that maven artifact less that 100 MB (ideally even smaller) so that it does not put a big barrier to entry to people downloading the default distribution of Stanbol.

To avoid slowing down the svn repo, those data files will not be put under version control, just the pom.xml + script to rebuild the artifact from a previous version of the jar.

Attachments

Sub-Tasks

1.	package english opennlp models	Closed	Olivier Grisel	Actions
2.	package solr index for popular entities	Closed	Unassigned	Actions
3.	package solr index for popular topics	Closed	Unassigned	Actions

Activity

People

Assignee:: Olivier Grisel

Reporter:: Olivier Grisel

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 16/Feb/11 11:53

Updated:: 09/May/12 13:47

Resolved:: 06/Jan/12 14:43

Agile

View on Board

Create a maven artifact to embed all the default stanbol models data