[SOLR-11299] Time partitioned collections (umbrella issue) - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: SolrCloud
Labels:
None

Description

Solr ought to have the ability to manage large-scale time-series data (think logs or sensor data / IOT) itself without a lot of manual/external work. The most naive and painless approach today is to create a collection with a high numShards with hash routing but this isn't as good as partitioning the underlying indexes by time for these reasons:

Easy to scale up/down horizontally as data/requirements change. (No need to over-provision, use shard splitting, or re-index with different config)
Faster queries:
- can search fewer shards, reducing overall load
- realtime search is more tractable (since most shards are stable – good caches)
- "recent" shards (that might be queried more) can be allocated to faster hardware
- aged out data is simply removed, not marked as deleted. Deleted docs still have search overhead.
Outages of a shard result in a degraded but sometimes a useful system nonetheless (compare to random subset missing)

Ideally you could set this up once and then simply work with a collection (potentially actually an alias) in a normal way (search or update), letting Solr handle the addition of new partitions, removing of old ones, and appropriate routing of requests depending on their nature.

This issue is an umbrella issue for the particular tasks that will make it all happen – either subtasks or issue linking.

Attachments

Issue Links

duplicates

SOLR-11536 Solr Index Partition by Timestamp(day)

Closed

incorporates

SOLR-11444 Improve Aliases.java and comma delimited collection list handling

Closed

is related to

SOLR-9690 Date/Time DocRouter

Resolved

SOLR-12295 Time Routed Aliases: Delete obsolete collections Async

Open

SOLR-9562 Minimize queried collections for time series alias

Reopened

relates to

SOLR-12308 LISTALIASES should return up to date response

Closed

(1 relates to)

Sub-Tasks

1.

Collection Alias metadata for time partitioned collections

Resolved

David Smiley

2.

Add URP to route time partitioned collections

Closed

David Smiley

3.

Expose Alias Properties CRUD in REST API

Closed

David Smiley

100%

Original Estimate - Not Specified

Original Estimate - Not Specified

Time Spent - 2h 20m

4.

create next time collection based on a fixed time gap

Closed

David Smiley

5.

TimePartitionedUpdateProcessor.lookupShardLeaderOfCollection should route to the ideal shard

Closed

David Smiley

6.

API to create a Time Routed Alias and first collection

Closed

David Smiley

100%

Original Estimate - Not Specified

Original Estimate - Not Specified

Time Spent - 50m

7.

API command to delete oldest collections in a time routed alias

Open

Unassigned

8.

Auto delete oldest collections in a time routed alias

Closed

David Smiley

9.

Create Time Routed Alias stress-test

Open

Unassigned

10.

Exception Class to identify out of range docs vs other errors

Open

Unassigned

11.

add option for deleting an alias to delete collections first

Open

Unassigned

12.

Document Time Routed Aliases separate from API

Closed

David Smiley

13.

TRA: Pre-emptively create next collection

Closed

David Smiley

100%

Original Estimate - Not Specified

Original Estimate - Not Specified

Time Spent - 9.5h

14.

TRA: evaluate autoDeleteAge independently of when collections are created

Open

Unassigned

100%

Original Estimate - Not Specified

Original Estimate - Not Specified

Time Spent - 20m

15.

TRA: document re-dating (question, test, docs)

Open

Unassigned

16.

Optimize Queries when sorting by router.field

Patch Available

Gus Heck

100%

Original Estimate - Not Specified

Original Estimate - Not Specified

Time Spent - 10m

17.

Optimize Queries when query filtering by TRA router.field

Patch Available

Gus Heck

Activity

People

Assignee:: David Smiley

Reporter:: David Smiley

Votes:: 4 Vote for this issue

Watchers:: 15 Start watching this issue

Dates

Created:: 30/Aug/17 18:32

Updated:: 08/Jun/19 15:20

Time Tracking

Estimated:

Not Specified

Remaining:

0h

Logged:

13h 10m

Include sub-tasks