[KUDU-3168] Script or document how to identify tables that suffer from KUDU-1400 - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Task
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 1.8.0
Fix Version/s: None
Component/s: compaction, documentation
Labels:
None

Description

In older versions of Kudu, tables may suffer from ~~KUDU-1400~~ when workloads are sequential and have a low ingest rate (e.g. KBs per minute). Today, the way to identify issues is to notice that scans for a specific tablet are slow for the amount of data being scanned, looking at that tablet's rowset layout diagram, and noting that a large number of rowsets. The guidance then is usually to rewrite the table with a higher ingest rate (e.g. through an Impala CTAS) and that typically solves the issue for that table.

Users may want ways to identify tables (not tablets) that suffer from this issue so they can be rewritten. It would be nice to document how to do this using existing tooling (or make a script available).

The kudu fs list tool with the rowset-id column filter seems like it would be useful in identifying the number of rowsets there are and how large they are. If used with kudu table list users ought to easily identify affected tables.

In later versions of Kudu, the num_rowsets_on_disk metric should be useful in identifying such cases (compute tablet size / num_rowsets_on_disk and compare against 32MB, the current default target DRS size).

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Andrew Wong

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 15/Jul/20 19:42

Updated:: 15/Jul/20 19:42