Uploaded image for project: 'Apache Blur'
  1. Apache Blur
  2. BLUR-445

Remove online mutates from the Blur thrift api

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.3.0
    • 0.3.0
    • Blur
    • None

    Description

      The primary use case for Blur is for massive ingestion of information to be indexed and searched. Currently I believe the system has been made overly complex due to the atomic operations in the online index mutation system. It forces the shard servers to have writers open to each of the indexes in the given table, this requires a lot of memory, cpu, and file resources per shard.

      Currently the system only allows for mutates to be atomic when mutating a single row. Batch mutates are not atomic.

      I propose that we move all index mutations to the bulk indexing approach and utilize hdfs snapshots for commiting index information within a given table. This will allow the controller and shard servers to become readonly with respect to the indexes.

      Assuming we move forward with this approach a new daemon will need to created, and index manager. This daemon will coordinate indexing (MR, Spark, Tez, Flink, etc) and merging globally for the cluster.

      Attachments

        Activity

          People

            Unassigned Unassigned
            amccurry Aaron McCurry
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: