Uploaded image for project: 'Marmotta (Retired)'
  1. Marmotta (Retired)
  2. MARMOTTA-469

KiWi: Hashing-Based ID Generation

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • KiWi Triple Store
    • None

    Description

      The KiWi triple store currently generates unique IDs for nodes and triples using a kind of sequence generator. Snowflake is generally very fast, but to ensure that the same object always gets the same ID a lot of synchronization is necessary (immediate commit for nodes, triple registry for triples), which has a considerable performance impact, particularly in clustered environments.

      A much faster approach would be to compute the ID from the objects themselves, e.g. using an efficient and good hashing function. With a 64bit hash, the probability for conflicts starts getting serious at around 2 billion objects (probability 10%), so it might make sense switching to 128bit keys as well.

      A good overview over clash probabilities is given in:

      http://preshing.com/20110504/hash-collision-probabilities/

      Changes would affect the API for ID generation (IDGenerator) as well as the value factory. In addition, we would need to ignore duplicate IDs for database inserts, e.g. using triggers or merge. Finally, we need to rethink the behaviour of deleted/non-deleted triples.

      Attachments

        Activity

          People

            wastl Sebastian Schaffert
            wastl Sebastian Schaffert
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: