[YARN-5814] Add druid as storage backend in YARN Timeline Service - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 3.0.0-alpha2
Fix Version/s: None
Component/s: ATSv2
Labels:
None

Target Version/s:

3.5.0

Description

Introduction

I propose to add druid as storage backend in YARN Timeline Service.

We run more than 6000 applications and generate 450 million metrics daily in Alibaba Clusters with thousands of nodes. We need to collect and store meta/events/metrics data, online analyze the utilization reports of various dimensions and display the trends of allocation/usage resources for cluster by joining and aggregating data. It helps us to manage and optimize the cluster by tracking resource utilization.

To achieve our goal we have changed to use druid as the storage instead of HBase and have achieved sub-second OLAP performance in our production environment for few months.

Analysis

Currently YARN Timeline Service only supports aggregating metrics at a) flow level by FlowRunCoprocessor and b) application level metrics aggregating by AppLevelTimelineCollector, offline (time-based periodic) aggregation for flows/users/queues for reporting and analysis is planned but not yet implemented. YARN Timeline Service chooses Apache HBase as the primary storage backend. As we all know that HBase doesn't fit for OLAP.

For arbitrary exploration of data,such as online analyze the utilization reports of various dimensions(Queue,Flow,Users,Application,CPU,Memory) by joining and aggregating data, Druid's custom column format enables ad-hoc queries without pre-computation. The format also enables fast scans on columns, which is important for good aggregation performance.

To achieve our goal that support to online analyze the utilization reports of various dimensions, display the variation trends of allocation/usage resources for cluster, and arbitrary exploration of data, we propose to add druid storage and implement DruidWriter /DruidReader in YARN Timeline Service.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

Add-Druid-in-YARN-Timeline-Service.pdf
11/Nov/16 03:48
132 kB
Bingxue Qiu

Issue Links

relates to

YARN-5355 YARN Timeline Service v.2: alpha 2

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Bingxue Qiu

Votes:: 0 Vote for this issue

Watchers:: 25 Start watching this issue

Dates

Created:: 02/Nov/16 02:16

Updated:: 04/Jan/24 08:56