Uploaded image for project: 'Stanbol (Retired)'
  1. Stanbol (Retired)
  2. STANBOL-1014 Create Entityhub Indexing Tool for freebase.com
  3. STANBOL-1016

Add RDF Triple Filter support to the Jena TDB Indexing Source

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.12.0
    • Entityhub
    • None

    Description

      The freebase.com dump has ~1.200.000.000 triples. Loading those triples to Jena TDB takes ages if the RAM (available to the memory mapped files) is not huge enough to hold the data. If the number of imported triples exceeds the available RAM the import speed deceases to ~7k triples/sec on an SSD. For reaching those 7k triple/sec the logs show 1,5k reads and 1k writes per second so import speeds on normal hard discs should be much slower.

      As most of the Triples contained in the freebase dump are not relevant for indexing this issue will introduce a new feature to the Jena TDB Indexing Source that allows - on a very low level - to filter out triples.

      This Filter will be based on Triples provided by the Riot parser and define a single method

      accept(Node subject, Node predicate, Node object) : boolean

      In addition the interface will extend IndexingComponent, what will allow to configure it via the configuration file of the

      org.apache.stanbol.entityhub.indexing.source.jenatdb.RdfIndexingSource

      The parameter used to configure the filter will be called "import-filter" and the value MUST BE the Class name of the used implementation.

      The configuration of the jenatdb.RdfIndexingSource will be parsed to the Import Filters #setConfiguration(..) method. This means that users will need to add configuration properties of for the Import Filter to the configuration of the RdfIndexingSource.

      To keep things simple the RdfImportFilter interface will be specific to the Jena TDB Indexing Source.

      Attachments

        Activity

          People

            rwesten Rupert Westenthaler
            rwesten Rupert Westenthaler
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: