Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-7363

OpenTSDB Storage Plugin - Speed Up Query Planning

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Storage - Other
    • Labels:
      None

      Description

      In the current implementation of the OpenTSDB storage plugin, simple queries that should return within 100ms will take at least 90 to 120 seconds of planning time.

      While Drill is planning the query prior to execution, watching the OpenTSDB incoming query log shows many inefficient queries. For example, there are often upwards of 20 to 30 queries asking for all metrics from 47 years ago to be returned even though the original query passed to Drill has provided a start time less than this. Each of these queries takes 2-3 seconds to complete with our current small dataset.

      From what I can tell, this is related to the storage plugin preparing the output columns and how it needs to try and resolve all tags so it can include them as columns. This can be seen in the setupStructure() method in the Schema constructor.
      (contrib\storage-opentsdb\src\main\java\org\apache\drill\exec\store\openTSDB\client\Schema.java)

      I believe the storage plugin is getting every data point in the requested metric so that it can be confidant all tags will have an SQL column attributed to it.

      I propose to modify the storage plugin and investigate an alternate way of enumerating all tags within a metric using the OpenTSDB metadata tables. It should be possible to query the metadata for a given metric name and have OpenTSDB return all available tags and values that exist in that metric.

      The API endpoint is /api/search/lookup: http://opentsdb.net/docs/build/html/api_http/search/lookup.html

      This will require the OpenTSDB server to have either 'realtime ts tracking/incrementing' enabled or to have the command 'tsdb uid metasync' run on a schedule. This keeps OpenTSDB's metadata tables up to date.

       

      Further, there may be a way to open up tag filters to be sent in the Drill SQL query which can further improve query speed. If the end user knows what tag they want to filter on and are using an SQL WHERE <tag> = <value> clause, this occurs inside Drill once it obtains the unfiltered dataset from OpenTSDB, though OpenTSDB can do the filtering.

       

      I will open a pull request once I have a base implementation ready, though I am interested in any comments, feedback or discussion.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              iacon Nicholas Iacobucci
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: