[PHOENIX-3559] More disk space used with encoded column scheme with data in sparse columns - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Information Provided
Affects Version/s: None
Fix Version/s: 4.11.0
Component/s: None
Labels:
None

Description

Schema with 5K columns

create table (k1 integer, k2 integer, c1 varchar ... c5000 varchar CONSTRAINT PK PRIMARY KEY (K1, K2)) 
VERSIONS=1, MULTI_TENANT=true, IMMUTABLE_ROWS=true

In this schema, only 100 random columns are filled with random 15 chars. Rest are nulls.

Data size is 6X larger with encoded columns scheme compare to non-encoded. That is 12GB/1M rows encoded vs ~2GB/1M rows non-encoded.

When compressed GZ, size with encoded column scheme is still 35% higher.

Attachments

Activity

People

Assignee:: Samarth Jain

Reporter:: Mujtaba Chohan

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 03/Jan/17 22:00

Updated:: 24/Mar/17 00:11

Resolved:: 24/Mar/17 00:11