[KUDU-2671] Change hash number for range partitioning - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.8.0
Fix Version/s: 1.17.0
Component/s: client, java, master, server
Labels:

Flags:

Important

Description

For our usage, the kudu schema design isn't flexible enough.

We create our table for day range such as dt='20181112' as hive table.

But our data size change a lot every day, for one day it will be 50G， but for some other day it will be 500G. For this case, it be hard to set the hash schema. If too big, for most case, it will be too wasteful. But too small, there is a performance problem in the case of a large amount of data.

So we suggest a solution we can change the hash number by the history data of a table.

for example

we create schema with one estimated value.
we collect the data size by day range
we create new day range partition by our collected day size.

We use this feature for half a year, and it work well. We hope this feature will be useful for the community. Maybe the solution isn't so complete. Please help us make it better.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

屏幕快照 2019-01-24 下午12.03.41.png
24/Jan/19 04:04
72 kB
yangz

Issue Links

is duplicated by

KUDU-3069 Support to alter the number of hash buckets for newly added range partitions

Resolved

is related to

KUDU-3541 document KUDU-2671: Change hash number for range partitioning

Open

relates to

KUDU-3388 Automatically update information on partition schema

Open

Activity

People

Assignee:: Mahesh Reddy

Reporter:: yangz

Votes:: 3 Vote for this issue

Watchers:: 14 Start watching this issue

Dates

Created:: 24/Jan/19 03:59

Updated:: 04/Jan/24 15:17

Resolved:: 10/Aug/22 02:05