Uploaded image for project: 'Phoenix'
  1. Phoenix
  2. PHOENIX-3718 Provide user-level documentation for column encoding feature
  3. PHOENIX-3559

More disk space used with encoded column scheme with data in sparse columns

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Information Provided
    • None
    • 4.11.0
    • None
    • None

    Description

      Schema with 5K columns

      create table (k1 integer, k2 integer, c1 varchar ... c5000 varchar CONSTRAINT PK PRIMARY KEY (K1, K2)) 
      VERSIONS=1, MULTI_TENANT=true, IMMUTABLE_ROWS=true
      

      In this schema, only 100 random columns are filled with random 15 chars. Rest are nulls.

      Data size is 6X larger with encoded columns scheme compare to non-encoded. That is 12GB/1M rows encoded vs ~2GB/1M rows non-encoded.

      When compressed GZ, size with encoded column scheme is still 35% higher.

      Attachments

        Activity

          People

            samarthjain Samarth Jain
            mujtabachohan Mujtaba Chohan
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: