Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-9562

Minimize queried collections for time series alias

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Reopened
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      For indexing time series data(such as large log data), we can create a new collection regularly(hourly, daily, etc.) with a write alias and create a read alias for all of those collections. But all of the collections of the read alias are queried even if we search over very narrow time window. In this case, the docs to be queried may be stored in very small portion of collections. So we don't need to do that.

      I suggest this patch for read alias to minimize queried collections. Three parameters for CREATEALIAS action are added.

      Key Type Required Default Description
      timeField string No   The time field name for time series data. It should be date type.
      dateTimeFormat string No   The format of timestamp for collection creation. Every collection should has a suffix(start with "_") with this format.
      Ex. dateTimeFormat: yyyyMMdd, collectionName: col_20160927
      See DateTimeFormatter.
      timeZone string No   The time zone information for dateTimeFormat parameter.
      Ex. GMT+9.
      See DateTimeFormatter.

      And then when we query with filter query like this "timeField:[fromTime TO toTime]", only the collections have the docs for a given time range will be queried.

      Attachments

        1. SOLR-9562.patch
          37 kB
          Eungsop Yoo
        2. SOLR-9562-v2.patch
          39 kB
          Eungsop Yoo

        Issue Links

          Activity

            People

              Unassigned Unassigned
              Eungsop Yoo Eungsop Yoo
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated: